SQL query with many 'AND NOT CONTAINS' statements

SQL query with many 'AND NOT CONTAINS' statements - sql

I am trying to exclude timezones that have a substring in them so I only have records likely from the US.
The query works fine (e.g., the first line after the OR will remove local_timezones that include 'Africa/Abidjan'), but there's got to be a better way to write it.
It's too verbose, repetitive, and I suspect it's slower than it could be. Any advice greatly appreciated. (I'm using Snowflake's flavor of SQL but not sure that matters in this case).
NOTE: I'd like to keep a timezone such as America/Los_Angeles, but not America/El_Salvador, so for this reason I don't think wildcards are a good solution.
SELECT a_col
FROM a_table
WHERE
(country = 'United States')
OR
((country is NULL and not contains (local_timezone, 'Africa')
AND
country is NULL and not contains (local_timezone, 'Asia')
AND
country is NULL and not contains (local_timezone, 'Atlantic')
AND
country is NULL and not contains (local_timezone, 'Australia')
AND
country is NULL and not contains (local_timezone, 'Etc')
AND
country is NULL and not contains (local_timezone, 'Europe')
AND
country is NULL and not contains (local_timezone, 'Araguaina')
etc etc

If you have a known list of "good things" I would make a table, and then just JOIN to id. Here I made you a list of good timezones:
CREATE TABLE acceptable_timezone (tz_name text) AS
SELECT * FROM VALUES
('Pacific/Auckland'),
('Pacific/Fiji'),
('Pacific/Tahiti');
I love me some Pacific... now we have some important data in a CTE
WITH data(id, timezone) AS (
SELECT * FROM VALUES
(1, 'Pacific/Auckland'),
(2, 'Pacific/Fiji'),
(3, 'America/El_Salvador')
)
SELECT d.*
FROM data AS d
JOIN acceptable_timezone AS a
ON a.tz_name = d.timezone
ORDER BY 1;
which total does not match the El Salvador:
ID
TIMEZONE
1
Pacific/Auckland
2
Pacific/Fiji
You cannot get much faster than an equijoin, but if your DATA has the timezones as substrings, then the TABLE can have the wildcard matches % and you can use a LIKE just like Felipe's answer does but as
JOIN acceptable_timezone AS a
ON d.timezone LIKE a.tz_name

You can use LIKE ANY:
with data as
(select null country, 'something Australia maybe' local_timezone)
select *
from data
where country = 'United States'
or (
country is null
and not local_timezone like any ('%Australia%', '%Africa%', '%Atlantic%')
)

Related

Create a hardcoded "mapping table" in Trino SQL

I have a query (several CTEs) that get data from different sources. The output has a column name, but I would like to map this nameg to a more user-friendly name.
Id
name
1
buy
2
send
3
paid
I would like to hard code somewhere in the query (in another CTE?) a mapping table. Don't want to create a separate table for it, just plain text.
name_map=[('buy', 'Item purchased'),('send', 'Parcel in transit'), ('paid', 'Charge processed')]
So output table would be:
Id
name
1
Item purchased
2
Parcel in transit
3
Charge processed
In Trino I see the function map_from_entries and element_at, but don't know if they could work in this case.
I know "case when" might work, but if possible, a mapping table would be more convenient.
Thanks

As a simpler alternative to the other answer, you don't actually need to create an intermediate map using map_from_entries and look up values using element_at. You can just create an inline mapping table with VALUES and use a regular JOIN to do the lookups:
WITH mapping(name, description) AS (
VALUES
('buy', 'Item purchased'),
('send', 'Parcel in transit'),
('paid', 'Charge processed')
)
SELECT description
FROM t JOIN mapping ON t.name = mapping.name
(The query assumes your data is in a table named t that contains a column named name to use for the lookup)

Super interesting idea, and I think I got it working:
with tmp as (
SELECT *
FROM (VALUES ('1', 'buy'),
('2', 'send'),
('3', 'paid')) as t(id, name)
)
SELECT element_at(name_map, name) as name
FROM tmp
JOIN (VALUES map_from_entries(
ARRAY[('buy', 'Item purchased'),
('send', 'Parcel in transit'),
('paid', 'Charge processed')])) as t(name_map) ON TRUE
Output:
name
Item purchased
Parcel in transit
Charge processed
To see a bit more of what's happening, we can look at:
SELECT *, element_at(name_map, name) as name
id
name
name_map
name
1
buy
{buy=Item purchased, paid=Charge processed, send=Parcel in transit}
Item purchased
2
send
{buy=Item purchased, paid=Charge processed, send=Parcel in transit}
Parcel in transit
3
paid
{buy=Item purchased, paid=Charge processed, send=Parcel in transit}
Charge processed
I'm not sure how efficient this is, but it's certainly an interesting idea.

SELECT DISTINCT to return at most one row

Given the following db structure:
Regions
id
name
1
EU
2
US
3
SEA
Customers:
id
name
region
1
peter
1
2
henry
1
3
john
2
There is also a PL/pgSQL function in place, defined as sendShipment() which takes (among other things) a sender and a receiver customer ID.
There is a business constraint around this which requires us to verify that both sender and receiver sit in the same region - and we need to do this as part of sendShipment(). So from within this function, we need to query the customer table for both the sender and receiver ID and verify that both their region ID is identical. We will also need to ID itself for further processing down the line.
So maybe something like this:
SELECT DISTINCT region FROM customers WHERE id IN (?, ?)
The problem with this is that the result will be either an array (if the customers are not within the same region) or a single value.
Is there are more elegant way of solving this constraint? I was thinking of SELECT INTO and use a temporary table, or I could SELECT COUNT(DISTINCT region) and then do another SELECT for the actual value if the count is less than 2, but I'd like to avoid the performance hit if possible.

There is also a PL/pgSQL function in place, defined as sendShipment() which takes (among other things) a sender and a receiver customer ID.
There is a business constraint around this which requires us to verify that both sender and receiver sit in the same region - and we need to do this as part of sendShipment(). So from within this function, we need to query the customer table for both the sender and receiver ID and verify that both their region ID is identical. We will also need to ID itself for further processing down the line.
This query should work:
WITH q AS (
SELECT
COUNT( * ) AS CountCustomers,
COUNT( DISTINCT c.Region ) AS CountDistinctRegions,
-- MIN( c.Region ) AS MinRegion
FIRST_VALUE( c.Region ) OVER ( ORDER BY c.Region ) AS MinRegion
FROM
Customers AS c
WHERE
c.CustomerId = $senderCustomerId
OR
c.CustomerId = $receiverCustomerId
)
SELECT
CASE WHEN q.CountCustomers = 2 AND q.CountDistinctRegions = 2 THEN 'OK' ELSE 'BAD' END AS "Status",
CASE WHEN q.CountDistinctRegions = 2 THEN q.MinRegion END AS SingleRegion
FROM
q
The above query will always return a single row with 2 columns: Status and SingleRegion.
SQL doesn't have a "SINGLE( col )" aggregate function (i.e. a function that is NULL unless the aggregation group has a single row), but we can abuse MIN (or MAX) with a CASE WHEN COUNT() in a CTE or derived-table as an equivalent operation.
Alternatively, windowing-functions could be used, but annoyingly they don't work in GROUP BY queries despite being so similar, argh.
Once again, this is the ISO SQL committee's fault, not PostgreSQL's.
As your Region column is UUID you cannot use it with MIN, but I understand it should work with FIRST_VALUE( c.Region ) OVER ( ORDER BY c.Region ) AS MinRegion.
As for the columns:
The Status column is either 'OK' or 'BAD' based on those business-constraints you mentioned. You might want to change it to a bit column instead of a textual one, though.
The SingleRegion column will be NOT NULL (with a valid region) if CountDistinctRegions = 2 regardless of CountCustomers, but feel free to change that, just-in-case you still want that info.

For anybody else who's interested in a simple solution, I finally came up with the (kind of obvious) way to do it:
SELECT
r.region
FROM
customers s
INNER JOIN customers r ON
s.region = r.region
WHERE s.id = 'sender_id' and r.id = 'receiver_id';
Huge credit to SELECT DISTINCT to return at most one row who helped me out a lot on this and also posted a viable solution.

Select only the row with the max value, but the column with this info is a SUM()

I have the following query:
SELECT DISTINCT
CAB.CODPARC,
PAR.RAZAOSOCIAL,
BAI.NOMEBAI,
SUM(VLRNOTA) AS AMOUNT
FROM TGFCAB CAB, TGFPAR PAR, TSIBAI BAI
WHERE CAB.CODPARC = PAR.CODPARC
AND PAR.CODBAI = BAI.CODBAI
AND CAB.TIPMOV = 'V'
AND STATUSNOTA = 'L'
AND PAR.CODCID = 5358
GROUP BY
CAB.CODPARC,
PAR.RAZAOSOCIAL,
BAI.NOMEBAI
Which the result is this. Company names and neighborhood hid for obvious reasons
The query at the moment, for those who don't understand Latin languages, is giving me clients, company name, company neighborhood, and the total value of movements.
in the WHERE clause it is only filtering sales movements of companies from an established city.
But if you notice in the Select statement, the column that is retuning the value that aggregates the total amount of value of sales is a SUM().
My goal is to return only the company that have the maximum value of this column, if its a tie, display both of em.
This is where i'm struggling, cause i can't seem to find a simple solution. I tried to use
WHERE AMOUNT = MAX(AMOUNT)
But as expected it didn't work

You tagged the question with the whole bunch of different databases; do you really use all of them?
Because, "PL/SQL" reads as "Oracle". If that's so, here's one option.
with temp as
-- this is your current query
(select columns,
sum(vrlnota) as amount
from ...
where ...
)
-- query that returns what you asked for
select *
from temp t
where t.amount = (select max(a.amount)
from temp a
);

You should be able to achieve the same without the need for a subquery using window over() function,
WITH T AS (
SELECT
CAB.CODPARC,
PAR.RAZAOSOCIAL,
BAI.NOMEBAI,
SUM(VLRNOTA) AS AMOUNT,
MAX(VLRNOTA) over() AS MAMOUNT
FROM TGFCAB CAB
JOIN TGFPAR PAR ON PAR.CODPARC = CAB.CODPARC
JOIN TSIBAI BAI ON BAI.CODBAI = PAR.CODBAI
WHERE CAB.TIPMOV = 'V'
AND STATUSNOTA = 'L'
AND PAR.CODCID = 5358
GROUP BY CAB.CODPARC, PAR.RAZAOSOCIAL, BAI.NOMEBAI
)
SELECT CODPARC, RAZAOSOCIAL, NOMEBAI, AMOUNT
FROM T
WHERE AMOUNT=MAMOUNT
Note it's usually (always) beneficial to join tables using clear explicit join syntax. This should be fine cross-platform between Oracle & SQL Server.

SQL query that finds data point based on exclusion from another data point

Not sure how helpful title is, so let me get straight to it.
Below is a query (and proceeding result set) to give you an idea of what I'm working with:
select PRACT_ID, ID_Number, DocumentName
from Practitioner_ID_Numbers
where PRACT_ID = 1193
PRACT_ID ID_Number DocumentName
1193 H9704 State License
1193 BR1918804 DEA Number
1193 10080428 Controlled Substance
1193 E51693 Medicare UPIN
1193 00419V Medicare Provider
1193 None Medicaid Provider
Pract_ID = unique identifier of person
ID_Number = identifying number associated to document
DocumentName = identifies type of document (for example, id_number could be (555)555-1234 and documentname would be 'phone number')
So what I need to do is write a query that identifies all pract_id's who have no entry for documentname type 'NPI number'.

You can use aggregation:
select PRACT_ID
from Practitioner_ID_Numbers
group by PRACT_ID
having sum(case when DocumentName = 'NPI number' then 1 else 0 end) = 0;

Exclusion is easily solved with NOT EXISTS:
select distinct PRACT_ID
from Practitioner_ID_Numbers p
where not exists (
select 1 from Practitioner_ID_Numbers
where PRACT_ID = p.PRACT_ID and DocumentName = 'NPI number'
)

You can join a record-set of practitioners to Practitioner_ID_Numbers to figure out which practitioners don't have an NPI.
Here, I'm using a CTE based on Practitioner_ID_Numbers, but if you have a separate table that stores practitioners, you can use that instead:
WITH
Practitioners(PRACT_ID) AS
(
SELECT DISTINCT PRACT_ID FROM Practitioner_ID_Numbers
)
SELECT Practitioners.PRACT_ID
FROM Practitioners
LEFT JOIN Practitioner_ID_Numbers AS ProviderNPIs ON
ProviderNPIs.PRACT_ID = Practitioners.PRACT_ID
AND ProviderNPIs.DocumentName = 'NPI number'
WHERE ProviderNPIs.PRACT_ID IS NULL
Note that in the JOIN, we indicate that we're only concerned with records where the DocumentName is "NPI number". We then indicate in the WHERE clause that we want records with a null PRACT_ID. This is how we determine which practitioners are missing an NPI.
A couple of additional notes:
Your naming conventions are a bit wonky: Inconsistent capitalization use of underscores can make queries tricky (did that field have an underscore or not?)
Why would you need to store "None" as an ID number? If the provider was missing an ID, wouldn't the value be NULL (or simply not exist in the table)?
Hopefully, your ID types (State License, DEA Number, etc.) are stored in a separate lookup table.
You might want to reconsider the attribute name DocumentName. Something like ID_Type might be more appropriate.

Comparing SQL Queries

I'm considering two SQL queries (Oracle) and I shall state the difference between them by showing examples. The queries are as follows:
/* Query 1 */
SELECT DISTINCT countryCode
FROM Member M
WHERE NOT EXISTS(
(SELECT organisation FROM Member
WHERE countryCode = 'US')
MINUS
(SELECT organisation FROM Member
WHERE countryCode = M.countryCode ) )
/* Query 2 */
SELECT DISTINCT M1.countryCode
FROM Member M1, Member M2
WHERE M2.countryCode = 'US' AND M1.organisation = M2.organisation
GROUP BY M1.countryCode
HAVING COUNT(M1.organisation) = (
SELECT COUNT(M3.organisation)
FROM Member M3 WHERE M3.countryCode = 'US' )
As far as I get it, these queries give back the countries which are members of the same organisations as the United States. The scheme of Member is (countryCode, organisation, type) with bold ones as primary key. Example: ('US', 'UN', 'member'). The member table contains only a few tuples and is not complete, so when executing (1) and (2) both yield the same result (e.g. Germany, since here only 'UN' and 'G7' are in the table).
So how can I show that these queries can actually return different results?
That means how can I create an example table instance of Member such that the queries yield different results based on that table instance?
Thanks for your time and effort.

The queries will result all the country codes which are members at least with all the organization the US is member with (it could be member with other organizations as well).

I've finally found an example to show that they can actually output different values based on the same Member instance. This is actually the case when Member contains duplicates. For query 1 this is not a problem, but for query 2 it actually affects the result, since here the number of memberships is crucial. So, if you have e.g. ('FR', 'UN', member) twice in Member the HAVING COUNT(M1.organisation) will return a different value as SELECT COUNT(M3.organisation) and 'FR' would not be part of the output.
Thanks to all for your constructive suggestions, that helped me a lot.

The first query would return countries whose list of memberships is longer than that of the US. It does require they include the same organizations as US but it could be more.
The second one requires the two membership lists to be identical.
As for creating an example with real data, start with an empty table and add this row:
insert into Member (countryCode, organisation)
values ('Elbonia', 'League of Fictitious Nations')
By the way a full outer join would let you characterize the difference symmetrically:
select
mo.countryCode || ' ' ||
case
when count(case when mu.organisation is null then 1 else null end) > 0
and count(case when mo.organisation is null then 1 else null end) > 0
then 'and US both have individual memberships they that do not have in common.'
when count(case when mo.organisation is null then 1 else null end) > 0
then 'is a member of some organisations that US is not a member of.'
when count(case when mo.organisation is null then 1 else null end) > 0
then 'is not a member of some organisations that US is a member of.'
else 'has identical membership as US.'
end
from
(select * from Member where countryCode = 'US') mu
full outer join
(select * from Member where countryCode = '??') mo
on mo.organisation = mu.organisation
Please forgive the dangling prepositions.
And a disk note, though duplicate rows are not allowed in normalized data, this query has no problem with those.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL query with many 'AND NOT CONTAINS' statements - sql

You can use LIKE ANY: with data as (select null country, 'something Australia maybe' local_timezone) select * from data where country = 'United States' or ( country is null and not local_timezone like any ('%Australia%', '%Africa%', '%Atlantic%') )

Related

Create a hardcoded "mapping table" in Trino SQL

SELECT DISTINCT to return at most one row

Select only the row with the max value, but the column with this info is a SUM()

SQL query that finds data point based on exclusion from another data point

Comparing SQL Queries

Categories

Resources