How to join YEARS from 2 tables with different format - sql

I am trying to create a custom view in excel for a pivot, I cant join the YEARSF because the tb1 format is 2016-17 and tb2 is 2016/17, how can i join these, see code below... what is the appropriate way of doing this
TB1.YEARF = TB2.YEARF
Seems to be the issue
SELECT TB1.YEARF,
TB1.MC,
TB1.CATEG,
TB1.ID,
TB1.TY,
TB1.CAT,
TB1.LOC,
TB2.HD0_NAME,
TB2.HD1_NAME,
TB2.HD2_NAME,
TB2.HD3_NAME,
TB2.HD4_NAME
FROM DB.TB2 TB2, DB.TB1 TB1
WHERE TB2.CAT = TB1.CAT AND TB2.LOC = TB1.LOC AND TB1.TY = TB2.TY AND TB1.YEARF = TB2.YEARF AND TB1.ID = TB2.ID

It happens that both Oracle and MySQL have REPLACE() functions.
If you change the part of your query that reads
AND TB1.YEARF = TB2.YEARF AND
to
AND TB1.YEARF = REPLACE(TB2.YEARF, '/', '-') AND
you may be able to join these tables. It's not going to be fast.
In general, to do this kind of inexact matching, you have to
figure out the rules for matching. For example, if you want abcd/ef to match abcd-ef then you can use what I wrote above. If you want abcd/ef to match abcd, that's a different rule.
Write a SQL expression to implement your rules.
But, you know, step 1 must come before step 2.
If all you care about is matching the abcd parts of abcd/ef and abcd-ef you can write a rule for that.
AND SUBSTR(TB1.YEARF, 1, 4) = SUBSTR(TB2.YEARF, 1, 4) AND

WHERE Format(TB1.YEARF, "yyyy-dd") = Format(TB2.YEARF, "yyyy-dd")

Related

Join under condition in Oracle SQL SELECT statement

I'm trying to join two tables under a condition but I haven't been able to make it work. I've been searching and reading but couldn't find an answer for my case.
This is the basic idea of what I'm trying to achieve (what I wrote in lowercase is what I want to achieve in my own words, in case you wonder):
SELECT TABLE1.STR_FULLNAME, TABLE1.STR_MAILBIZ, TABLE2.STR_IP
FROM TABLE1
INNER JOIN TABLE2
if TABLE2.STR_MACHINEUSER contains 'TEXT\' then join like this:
ON TABLE1.STR_LOGIN = SELECT SUBSTR(STR_MACHINEUSER, 6) AS STR_MACHINEUSER FROM TABLE2 WHERE
STR_MACHINEUSER LIKE 'TEXT\%'
else join like this:
ON TABLE1.STR_LOGIN = TABLE2.STR_MACHINEUSER
ORDER BY TABLE2.DT_INSERT DESC;
The content of Table1.STR_LOGIN and that of Table2.STR_MACHINEUSER are in some cases exactly the same and in some cases a prefix needs to be removed in Table2.STR_MACHINEUSER ('TEXT\'). I've seen that conditions should be handled with the CASE expression. I have tried different ways, but I couldn't make it work for what I need. I'm thinking that I might need a complete different approach, but I don't see what...
Does someone have a suggestion? Thanks in advance!
Maybe this is what you want:
SELECT TABLE1.STR_FULLNAME, TABLE1.STR_MAILBIZ, TABLE2.STR_IP
FROM TABLE1
INNER JOIN TABLE2 ON
(not TABLE2.STR_MACHINEUSER LIKE 'TEXT\%' and TABLE1.STR_LOGIN = TABLE2.STR_MACHINEUSER)
or
(TABLE2.STR_MACHINEUSER LIKE 'TEXT\%' and TABLE1.STR_LOGIN = SUBSTR(TABLE2.STR_MACHINEUSER, 6))
Sure you could shorten your statement using some case construct like:
case
when TABLE2.STR_MACHINEUSER LIKE 'TEXT\%'
then SUBSTR(TABLE2.STR_MACHINEUSER, 6) else TABLE2.STR_MACHINEUSER
end
and compare this to TABLE1.STR_LOGIN but I find (IMHO) the first (logic) join better readable.

LEFT JOIN in MS-Access with multiple match criteria (AND(OR)) bogging down

I'm working in MS-Access from Office 365.
t1 is a table with about 1,000 rows. I'm trying to LEFT JOIN t1 with t2 where t2 has a little under 200k rows. I'm trying to match up rows using Short Text strings in multiple fields, and all the relevant fields are indexed. The strings are relatively short, with the longest fields (the street fields) being about 15 characters on average.
Here is my query:
SELECT one.ID, two.ACCOUNT
FROM split_lct_2 AS one LEFT JOIN split_parcel AS two
ON (
nz(one.mySTREET) = nz(two.pSTREET)
OR nz(one.mySTREET_2) = nz(two.pSTREET)
OR nz(one.mySTREET_3) = nz(two.pSTREET)
)
AND (nz(one.myDIR) = nz(two.pDIR))
AND (nz(one.myHOUSE) = nz(two.pHOUSE));
The query works, however it behaves like a 3-year-old. The query table appears after several seconds, but remains sluggish indefinitely. For example, selecting a cell in the talble takes 3-7 seconds. Exporting the query table as a .dbf takes about 8 minutes.
My concern is that this is just a sample file to build the queries, the actual t1 will have over 200k rows to process.
Is there a way to structure this query that will significantly improve performance?
I don't know if it will help but
(
nz(one.mySTREET) = nz(two.pSTREET)
OR nz(one.mySTREET_2) = nz(two.pSTREET)
OR nz(one.mySTREET_3) = nz(two.pSTREET)
)
is the same as
nz(two.pSTREET) IN (nz(one.mySTREET),nz(one.mySTREET_2),nz(one.mySTREET_3))
it might be the optimizer can handle this better.
Definetely, joining tables using text fields is not something You are hoping for.
But, life is life.
If there is no possibility to convert text strings into integers (for example additional table with street_name and street_id), try this:
SELECT one.ID, two.ACCOUNT
FROM split_lct_2 AS one LEFT JOIN split_parcel AS two
ON (nz(one.mySTREET) = nz(two.pSTREET))
AND (nz(one.myDIR) = nz(two.pDIR))
AND (nz(one.myHOUSE) = nz(two.pHOUSE))
UNION
SELECT one.ID, two.ACCOUNT
FROM split_lct_2 AS one LEFT JOIN split_parcel AS two
ON (nz(one.mySTREET_2) = nz(two.pSTREET))
AND (nz(one.myDIR) = nz(two.pDIR))
AND (nz(one.myHOUSE) = nz(two.pHOUSE)
UNION
SELECT one.ID, two.ACCOUNT
FROM split_lct_2 AS one LEFT JOIN split_parcel AS two
ON (nz(one.mySTREET_3) = nz(two.pSTREET)
)
AND (nz(one.myDIR) = nz(two.pDIR))
AND (nz(one.myHOUSE) = nz(two.pHOUSE));
I suppose using Nz() does not allow to use index. Try to avoid them. If data has no NULLs in join key fields then Nz() should be safely removed from query and it should help. But if data has NULLs, you probably need to change this - for example to replace all NULLs with empty strings to make them join-able without Nz() - that's additional data processing outside of this query.

Nested Query Alternatives in AWS Athena

I am running a query that gives a non-overlapping set of first_party_id's - ids that are associated with one third party but not another. This query does not run in Athena, however, giving the error: Correlated queries not yet supported.
Was looking at prestodb docs, https://prestodb.io/docs/current/sql/select.html (Athena is prestodb under the hood), for an alternative to nested queries. The with statement example given doesn't seem to translate well for this not in clause. Wondering what the alternative to a nested query would be - Query below.
SELECT
COUNT(DISTINCT i.third_party_id) AS uniques
FROM
db.ids i
WHERE
i.third_party_type = 'cookie_1'
AND i.first_party_id NOT IN (
SELECT
i.first_party_id
WHERE
i.third_party_id = 'cookie_2'
)
There may be a better way to do this - I would be curious to see it too! One way I can think of would be to use an outer join. (I'm not exactly sure about how your data is structured, so forgive the contrived example, but I hope it would translate ok.) How about this?
with
a as (select *
from (values
(1,'cookie_n',10,'cookie_2'),
(2,'cookie_n',11,'cookie_1'),
(3,'cookie_m',12,'cookie_1'),
(4,'cookie_m',12,'cookie_1'),
(5,'cookie_q',13,'cookie_1'),
(6,'cookie_n',13,'cookie_1'),
(7,'cookie_m',14,'cookie_3')
) as db_ids(first_party_id, first_party_type, third_party_id, third_party_type)
),
b as (select first_party_type
from a where third_party_type = 'cookie_2'),
c as (select a.third_party_id, b.first_party_type as exclude_first_party_type
from a left join b on a.first_party_type = b.first_party_type
where a.third_party_type = 'cookie_1')
select count(distinct third_party_id) from c
where exclude_first_party_type is null;
Hope this helps!
You can use an outer join:
SELECT
COUNT(DISTINCT i.third_party_id) AS uniques
FROM
db.ids a
LEFT JOIN
db.ids b
ON a.first_party_id = b.first_party_id
AND b.third_party_id = 'cookie_2'
WHERE
a.third_party_type = 'cookie_1'
AND b.third_party_id is null -- this line means we select only rows where there is no match
You should also use caution when using NOT IN for subqueries that may return NULL values since the condition will always be true. Your query is comparing a.first_party_id to NULL, which will always be false and so NOT IN will lead to the condition always being true. Nasty little gotcha.
One way to avoid this is to avoid using NOT IN or to add a condition to your subquery i.e. AND third_party_id IS NOT NULL.
See here for a longer explanation.

Improve sql query without using 'union'

I want to improve this sql query without using UNION. When table_xx.uniqeId = 1 i want to use table_yy, when 2 i want to use table_ww. In fact i want this dynamic. i want to use an another table for uniqeId-table match because for future i can add extra table for example for table_xx.uniqeId = 3, use table_qq or something else. Can you suggest some idea, What can i use ?
Thanks.
SELECT
xx.info,
yy.value
FROM
table_xx xx,
table_yy yy
WHERE
xx.uniqeId = 1
UNION ALL
SELECT
xx.info,
yy.value
FROM
table_xx xx,
table_ww yy
WHERE
xx.uniqeId = 2
This would do it, provided that value is mandatory in yy and ww. But like Gordon Linoff already commented, a query like this doesn't necessary perform better. Most likely it won't, and it isn't much more readable as well. I would choose the union.
SELECT
xx.info,
nvl(yy.value, ww.value) as value
FROM
table_xx xx
LEFT JOIN table_yy yy ON xx.uniqeId = 1
LEFT JOIN table_ww ww ON xx.uniqeId = 2
WHERE
nvl(yy.value, ww.value) IS NOT NULL
But I think if the introduction of an extra xx_id also requires an extra table, it's a sign that your database structure is incorrect.
A better idea would be to add the uniqueid to the 'value' table as well, so you can easily join the right values on xx. This way, you can add extra ids and matching values as much as you like, without having to modify the database to add extra value tables.
SELECT
xx.info,
yy.value
FROM
table_xx xx
INNER JOIN table_yy yy ON yy.xx_uniqueid = xx.uniqueid

What approach should I follow if I need to select a data 'EXCEPT' some other bunch of data?

What approach should I follow to construct my SQL query if I need to select a data exepct some other data?
For example, my
I want so select all the data from the data-base EXCEPT this result-set:
SELECT *
FROM table1
WHERE table1.MarketTYpe = 'EmergingMarkets'
AND IsBigOne = 1
AND MarketVolume = 'MIDDLE'
AND SomeClass = 'ThirdClass'
Should I use
NOT IN (the aboe result set)
Or shoudl I get INVERSE of the conditions like != inseat of = etc.
Or ?
Can you advice?
Use the EXCEPT construct?
SELECT *
FROM table1
EXCEPT
SELECT *
FROM table1
WHERE table1.MarketTYpe = 'EmergingMarkets'
AND IsBigOne = 1
AND MarketVolume = 'MIDDLE'
AND SomeClass = 'ThirdClass'
Note that EXCEPT and NOT EXISTS give the same query plan using "left anti semi joins".
NOT IN (subquery with above) may not give correct results if there are NULL values in the sub-query, hence I wouldn't use
I would avoid negation in the WHERE clause because it isn't readable straight awayAs the comments show on Michael's answer...
For more on "all rows except some rows", see these:
Combining datasets with EXCEPT versus checking on IS NULL in a LEFT JOIN
To take out those dept who has no employees assigned to it
SQL NOT IN possibly performance issues
(DBA.SE) https://dba.stackexchange.com/questions/4009/the-use-of-not-logic-in-relation-to-indexes/4010#4010
What database engine?
Minus operator in ORACLE
Except operator in SQL Server
Simplest and probably fastest here is to simply invert the conditions:
SELECT *
FROM table1
WHERE table1.MarketTYpe <> 'EmergingMarkets'
OR IsBigOne <> 1
OR MarketVolume <> 'MIDDLE'
OR SomeClass <> 'ThirdClass'
This is likely to use lots fewer resources than doing a NOT IN(). You may wish to benchmark them to be certain, but the above is likely to be faster.
Use NOT IN because that makes it clear that you want the set in the main select statement excluding the subset in the NOT IN select statement.
I like gbn's answer, but another way of doing it can be:
SELECT *
FROM table1
WHERE NOT (table1.MarketTYpe = 'EmergingMarkets'
AND IsBigOne = 1
AND MarketVolume = 'MIDDLE'
AND SomeClass = 'ThirdClass')