SQL IN and AND clause output - sql

I have written one small query like below. It is giving me output.
select user_id
from table tf
where tf.function_id in ('1001051','1001060','1001061')
but when i am running query like below it is showing 0 out put.however i have verified manually we have user_id's where all the 3 function_id's are present.
select user_id
from table tf
where tf.function_id='1001051'
and
tf.function_id='1001060'
and
tf.function_id='1001061'
it looks very simple to use AND clause. However i am not gettng desired output. AM i doing something wrong?
Thanks in advance

Is this what you want to do?
select tf.user_id
from table tf
where tf.function_id in ('1001051', '1001060', '1001061')
group by tf.user_id
having count(distinct tf.function_id) = 3;
This returns users that have all three functions.
EDIT:
This is the query in your comment:
select tu.dealer_id, tu.usr_alias, tf.function_nm
from t_usr tu, t_usr_function tuf, t_function tf
where tu.usr_id = tuf.usr_id and tuf.function_id = tf.function_id and
tf.function_id = '1001051' and tf.function_id = '1001060' and tf.function_id = '1001061' ;
First, you should learn proper join syntax. Simple rule: Never use commas in the from clause.
I think the query you want is:
select tu.dealer_id, tu.usr_alias
from t_usr tu join
t_usr_function tuf
on tu.usr_id = tuf.usr_id
where tuf.function_id in ('1001051', '1001060', '1001061')
group by tu.dealer_id, tu.usr_alias
having count(distinct tuf.function_id) = 3;
This doesn't give you the function name. I'm not sure why you need such detail if all three functions are there for each "user" (or at least dealer/user alias combination). And, the original question doesn't request this level of detail.

Using 'AND' clause mean that the query should satisfy all of the conditions.
in your case, you need to return either when the function_id='1001051' OR function_id='1001060'.
So in brief you need to replace the AND by OR.
select user_id from table tf
where tf.function_id='1001051' OR tf.function_id='1001060' OR tf.function_id='1001061'
Thats what the IN do, it compares with either of them.

As I pointed out in the comment, AND is not the right operator since all three conditions together will not be met. Use OR instead,
select user_id from table tf
where tf.function_id='1001051' OR tf.function_id='1001060' OR tf.function_id='1001061'

You're asking for the value to be three different values at the same time. A better use would be to use OR instead of AND:
select user_id from table tf
where tf.function_id='1001051' or tf.function_id='1001060' or tf.function_id='1001061'

If all of these things are true:
tf.function_id='1001051'
tf.function_id='1001060'
tf.function_id='1001061'
Then simple algebra tells us this must also be true:
'1001051'='1001060'='1001061'
Since that clearly can't ever be true, your SQL statement's where clause will always resolve to false.
What you want to say is that any of those conditions is true (which is equivalent to in), which means you need to use or:
SELECT user_id
FROM table tf
WHERE tf.function_id = '1001051'
OR tf.function_id = '1001060'
OR tf.function_id = '1001061'
The where clause applies to each row returned by the query. In order to gather data across rows, you either need to join the table to itself enough times to create a single row that satisfies the condition you're looking for or use aggregate functions to consolidate several rows into a single row.
Self-join solution:
SELECT user_id
FROM table tf1
JOIN table tf2 ON tf1.user_id = tf2.user_id
JOIN table tf3 ON tf1.user_id = tf3.user_id
WHERE tf1.function_id = '1001051'
AND tf2.function_id = '1001060'
AND tf3.function_id = '1001061'
Aggregate solution:
SELECT user_id
FROM table tf
WHERE tf.function_id IN ('1001051', '1001060', '1001061')
GROUP BY user_id
HAVING COUNT (DISTINCT tf.function_id) = 3

Try this as this link SQL IN
select function_id, user_id from table tf
where tf.function_id in ('1001051','1001060','1001061')

Related

MAX function not working in Oracle statement

I have the following statement using MAX(woq.wq_version) and it keeps returning two results.
SELECT woq.wo_number, woq.quote_amount, MAX(woq.wq_version) version
FROM ba_view_wo_quote woq
LEFT JOIN sm_header smh
ON woq.woo_auto_key = smh.woo_auto_key
WHERE woq.woo_auto_key = smh.woo_auto_key
AND woq.wo_number = 'WO1110885'
AND woq.quote_amount <> '0'
HAVING woq.wq_version = MAX(woq.wq_version)
GROUP BY woq.wq_version, woq.quote_amount, woq.wo_number
I keep receiving these results:
wo_number
quote_amount
version
WO1110885
2803.15
1
WO1110885
1200
2
It sounds like you just want
select woq.wo_number,
woq.quote_amount,
woq.wq_version version
from ba_view_wo_quote woq
left join sm_header smh on woq.woo_auto_key=smh.woo_auto_key
where woq.wo_number = 'WO1110885'
and woq.quote_amount<>'0'
order by woq.quote_amount desc
fetch first 1 row only
If that isn't what you're looking for, it would be helpful to update your question with a reproducible test case that shows us what your tables look like, what your data looks like, and what results you want for that data.
Note that it doesn't make sense to duplicate the same condition in the on clause of your join and in the where clause so I got rid of the where clause condition.

Self joining columns from the same table with calculation on one column not displaying column name

I am fairly new to SQL and having issues figuring out how to solve the simple issue below. I have a dataset I am trying to self-join, I am using (b.calendar_year_number -1) as one of the columns to join. I applied a calculation of -1 with the goal of trying to match values from the previous year. However, it is not working as the resulting column shows (No column name) with a screenshot attached below. How do I change the alias to b.calendar_year_number after the calculation?
Code:
SELECT a.day_within_fiscal_period,
a.calendar_month_name,
a.cost_period_rolling_three_month_start_date,
a.calendar_year_number,
b.day_within_fiscal_period,
b.calendar_month_name,
b.cost_period_rolling_three_month_start_date,
(b.calendar_year_number -1)
FROM [data_mart].[v_dim_date_consumer_complaints] AS a
JOIN [data_mart].[v_dim_date_consumer_complaints] AS b
ON b.day_within_fiscal_period = a.day_within_fiscal_period AND
b.calendar_month_name = a.calendar_month_name AND
b.calendar_year_number = a.calendar_year_number
I am using (b.calendar_year_number -1) as one of the columns to join.
Nope, you're not. Look at your join statement and you'll see the third condition is:
b.calendar_year_number = a.calendar_year_number
So just change that to include the calculation. As far as the 'no column name' issue, you can use colname = somelogic syntax or somelogic as colname. Below, I used the former syntax.
select a.day_within_fiscal_period,
a.calendar_month_name,
a.cost_period_rolling_three_month_start_date,
a.calendar_year_number,
b.day_within_fiscal_period,
b.calendar_month_name,
b.cost_period_rolling_three_month_start_date,
bCalYearNum = b.calendar_year_number
from [data_mart].[v_dim_date_consumer_complaints] a
left join [data_mart].[v_dim_date_consumer_complaints] b
on b.day_within_fiscal_period = a.day_within_fiscal_period
and b.calendar_month_name = a.calendar_month_name
and b.calendar_year_number - 1 = a.calendar_year_number;
You could use the analytical function LAG/LEAD to get your required result, no self-join necessary:
select a.day_within_fiscal_period,
a.calendar_month_name,
a.cost_period_rolling_three_month_start_date,
a.calendar_year_number,
old_cost_period_rolling_three_month_start_date =
LAG(cost_period_rolling_three_month_start_date) OVER
(PARTITION BY calendar_month_name, day_within_fiscal_period
ORDER BY calendar_year_number),
old_CalYearNum = LAG(calendar_year_number) OVER
(PARTITION BY calendar_month_name, day_within_fiscal_period
ORDER BY calendar_year_number)
from [data_mart].[v_dim_date_consumer_complaints] a

Nested Query Alternatives in AWS Athena

I am running a query that gives a non-overlapping set of first_party_id's - ids that are associated with one third party but not another. This query does not run in Athena, however, giving the error: Correlated queries not yet supported.
Was looking at prestodb docs, https://prestodb.io/docs/current/sql/select.html (Athena is prestodb under the hood), for an alternative to nested queries. The with statement example given doesn't seem to translate well for this not in clause. Wondering what the alternative to a nested query would be - Query below.
SELECT
COUNT(DISTINCT i.third_party_id) AS uniques
FROM
db.ids i
WHERE
i.third_party_type = 'cookie_1'
AND i.first_party_id NOT IN (
SELECT
i.first_party_id
WHERE
i.third_party_id = 'cookie_2'
)
There may be a better way to do this - I would be curious to see it too! One way I can think of would be to use an outer join. (I'm not exactly sure about how your data is structured, so forgive the contrived example, but I hope it would translate ok.) How about this?
with
a as (select *
from (values
(1,'cookie_n',10,'cookie_2'),
(2,'cookie_n',11,'cookie_1'),
(3,'cookie_m',12,'cookie_1'),
(4,'cookie_m',12,'cookie_1'),
(5,'cookie_q',13,'cookie_1'),
(6,'cookie_n',13,'cookie_1'),
(7,'cookie_m',14,'cookie_3')
) as db_ids(first_party_id, first_party_type, third_party_id, third_party_type)
),
b as (select first_party_type
from a where third_party_type = 'cookie_2'),
c as (select a.third_party_id, b.first_party_type as exclude_first_party_type
from a left join b on a.first_party_type = b.first_party_type
where a.third_party_type = 'cookie_1')
select count(distinct third_party_id) from c
where exclude_first_party_type is null;
Hope this helps!
You can use an outer join:
SELECT
COUNT(DISTINCT i.third_party_id) AS uniques
FROM
db.ids a
LEFT JOIN
db.ids b
ON a.first_party_id = b.first_party_id
AND b.third_party_id = 'cookie_2'
WHERE
a.third_party_type = 'cookie_1'
AND b.third_party_id is null -- this line means we select only rows where there is no match
You should also use caution when using NOT IN for subqueries that may return NULL values since the condition will always be true. Your query is comparing a.first_party_id to NULL, which will always be false and so NOT IN will lead to the condition always being true. Nasty little gotcha.
One way to avoid this is to avoid using NOT IN or to add a condition to your subquery i.e. AND third_party_id IS NOT NULL.
See here for a longer explanation.

The "where" condition worked not as expected ("or" issue)

I have a problem to join thoses 4 tables
Model of my database
I want to count the number of reservations with different sorts (user [mrbs_users.id], room [mrbs_room.room_id], area [mrbs_area.area_id]).
Howewer when I execute this query (for the user (id=1) )
SELECT count(*)
FROM mrbs_users JOIN mrbs_entry ON mrbs_users.name=mrbs_entry.create_by
JOIN mrbs_room ON mrbs_entry.room_id = mrbs_room.id
JOIN mrbs_area ON mrbs_room.area_id = mrbs_area.id
WHERE mrbs_entry.start_time BETWEEN "145811700" and "1463985000"
or
mrbs_entry.end_time BETWEEN "1458120600" and "1463992200" and mrbs_users.id = 1
The result is the total number of reservations of every user, not just the user who has the id = 1.
So if anyone could help me.. Thanks in advance.
Use parentheses in the where clause whenever you have more than one condition. Your where is parsed as:
WHERE (mrbs_entry.start_time BETWEEN "145811700" and "1463985000" ) or
(mrbs_entry.end_time BETWEEN "1458120600" and "1463992200" and
mrbs_users.id = 1
)
Presumably, you intend:
WHERE (mrbs_entry.start_time BETWEEN 145811700 and 1463985000 or
mrbs_entry.end_time BETWEEN 1458120600 and 1463992200
) and
mrbs_users.id = 1
Also, I removed the quotes around the string constants. It is bad practice to mix data types, and in some databases, the conversion between types can make the query less efficient.
The problem you've faced caused by the incorrect condition WHERE.
So, should be:
WHERE (mrbs_entry.start_time BETWEEN 145811700 AND 1463985000 )
OR
(mrbs_entry.end_time BETWEEN 1458120600 AND 1463992200 AND mrbs_users.id = 1)
Moreover, when you use only INNER JOIN (JOIN) then it be better to avoid WHERE clause, because the ON clause is executed before the WHERE clause, so criteria there would perform faster.
Your query in this case should be like this:
SELECT COUNT(*)
FROM mrbs_users
JOIN mrbs_entry ON mrbs_users.name=mrbs_entry.create_by
JOIN mrbs_room ON mrbs_entry.room_id = mrbs_room.id
AND
(mrbs_entry.start_time BETWEEN 145811700 AND 1463985000
OR ( mrbs_entry.end_time BETWEEN 1458120600 AND 1463992200 AND mrbs_users.id = 1)
)
JOIN mrbs_area ON mrbs_room.area_id = mrbs_area.id

Remove duplicates in a Django query

Is there a simple way to remove duplicates in the following basic query:
email_list = Emails.objects.order_by('email')
I tried using duplicate() but it was not working. What is the exact syntax for doing this query without duplicates?
This query will not give you duplicates - ie, it will give you all the rows in the database, ordered by email.
However, I presume what you mean is that you have duplicate data within your database. Adding distinct() here won't help, because even if you have only one field, you also have an automatic id field - so the combination of id+email is not unique.
Assuming you only need one field, email_address, de-duplicated, you can do this:
email_list = Email.objects.values_list('email', flat=True).distinct()
However, you should really fix the root problem, and remove the duplicate data from your database.
Example, deleting duplicate Emails by email field:
for email in Email.objects.values_list('email', flat=True).distinct():
Email.objects.filter(pk__in=Email.objects.filter(email=email).values_list('id', flat=True)[1:]).delete()
Or books by name:
for name in Book.objects.values_list('name', flat=True).distinct():
Book.objects.filter(pk__in=Artwork.objects.filter(name=name).values_list('id', flat=True)[3:]).delete()
For checking duplicate you can do a GROUP_BY and HAVING in Django as below. We are using Django annotations here.
from django.db.models import Count
from app.models import Email
duplicate_emails = Email.objects.values('email').annotate(email_count=Count('email')).filter(email_count__gt=1)
Now looping through the above data and deleting all other emails except the first one (depends on requirement or whatever).
for data in duplicates_emails:
email = data['email']
Email.objects.filter(email=email).order_by('pk')[1:].delete()
You can chain .distinct() on the end of your queryset to filter duplicates. Check out: http://docs.djangoproject.com/en/dev/ref/models/querysets/#django.db.models.query.QuerySet.distinct
You may be able to use the distinct() function, depending on your model. If you only want to retrieve a single field form the model, you could do something like:
email_list = Emails.objects.values_list('email').order_by('email').distinct()
which should give you an ordered list of emails.
You can also use set()
email_list = set(Emails.objects.values_list('email', flat=True))
Use, self queryset.annotate()!
from django.db.models import Subquery, OuterRef
email_list = Emails.objects.filter(
pk__in = Emails.objects.values('emails').distinct().annotate(
pk = Subquery(
Emails.objects.filter(
emails= OuterRef("emails")
)
.order_by("pk")
.values("pk")[:1])
)
.values_list("pk", flat=True)
)
This queryset goes to make this query.
SELECT `email`.`id`,
`email`.`title`,
`email`.`body`,
...
...
FROM `email`
WHERE `email`.`id` IN (
SELECT DISTINCT (
SELECT U0.`id`
FROM `email` U0
WHERE U0.`email` = V0.`approval_status`
ORDER BY U0.`id` ASC
LIMIT 1
) AS `pk`
FROM `agent` V0
)
cheet-sheet
from django.db.models import Subquery, OuterRef
group_by_duplicate_col_queryset = Models.objects.filter(
pk__in = Models.objects.values('duplicate_col').distinct().annotate(
pk = Subquery(
Models.objects.filter(
duplicate_col= OuterRef('duplicate_col')
)
.order_by("pk")
.values("pk")[:1])
)
.values_list("pk", flat=True)
)
I used the following to actually remove the duplicate entries from from the database, hopefully this helps someone else.
adds = Address.objects.all()
d = adds.distinct('latitude', 'longitude')
for address in adds:
if i not in d:
address.delete()
you can use this raw query : your_model.objects.raw("select * from appname_Your_model group by column_name")