select records where condition is true in one record - sql

I need to select cid, project, and owner from rows in the table below where one or more rows for a cid/project combination has an owner of 1.
cid | project | phase | task | owner
-----------------------------------
1 | 1 | 1 | 1 | 1
1 | 1 | 1 | 2 | 2
1 | 1 | 1 | 3 | 2
2 | 1 | 1 | 1 | 1
2 | 1 | 1 | 2 | 1
3 | 1 | 1 | 3 | 2
My output table should look like the this:
cid | project | phase | task | owner
-----------------------------------
1 | 1 | 1 | 1 | 1
1 | 1 | 1 | 2 | 2
1 | 1 | 1 | 3 | 2
2 | 1 | 1 | 1 | 1
2 | 1 | 1 | 2 | 1
The below query is what I came up with. It does seem to test okay, but my confidence is low. Is the query an effective way to solve the problem?
select task1.cid, task1.project, task1.owner
from
(select cid, project, owner from table) task1
right join
(select distinct cid, project, owner from table where owner = 1) task2
on task1.cid = task2.cid and task1.project = task2.project
(I did not remove the phase and task columns from the sample output so that it would be easier to compare.)

You can simply use a IN clause
select cid, project, owner
from table
where cid in (select distinct id from table where owner = 1)
or a inner join with a subquery
select a.cid, a.project, a.owner
from table a
INNER JOIN ( select distinct cid , project
from table where owner = 1
) t on t.cid = a.cid and t.project = a.project

Related

Postgres - Unique values for id column using CTE, Joins alongside GROUP BY

I have a table referrals:
id | user_id_owner | firstname | is_active | user_type | referred_at
----+---------------+-----------+-----------+-----------+-------------
3 | 2 | c | t | agent | 3
5 | 3 | e | f | customer | 5
4 | 1 | d | t | agent | 4
2 | 1 | b | f | agent | 2
1 | 1 | a | t | agent | 1
And another table activations
id | user_id_owner | referral_id | amount_earned | activated_at | app_id
----+---------------+-------------+---------------+--------------+--------
2 | 2 | 3 | 3.0 | 3 | a
4 | 1 | 1 | 6.0 | 5 | b
5 | 4 | 4 | 3.0 | 6 | c
1 | 1 | 2 | 2.0 | 2 | b
3 | 1 | 2 | 5.0 | 4 | b
6 | 1 | 2 | 7.0 | 8 | a
I am trying to generate another table from the two tables that has only unique values for referrals.id and returns as one of the columns the count for each apps as best_selling_app_count.
Here is the query I ran:
with agents
as
(select
referrals.id,
referral_id,
amount_earned,
referred_at,
activated_at,
activations.app_id
from referrals
left outer join activations
on (referrals.id = activations.referral_id)
where referrals.user_id_owner = 1),
distinct_referrals_by_id
as
(select
id,
count(referral_id) as activations_count,
sum(coalesce(amount_earned, 0)) as amount_earned,
referred_at,
max(activated_at) as last_activated_at
from
agents
group by id, referred_at),
distinct_referrals_by_app_id
as
(select id, app_id as best_selling_app,
count(app_id) as best_selling_app_count
from agents
group by id, app_id )
select *, dense_rank() over (order by best_selling_app_count desc) best_selling_app_rank
from distinct_referrals_by_id
inner join distinct_referrals_by_app_id
on (distinct_referrals_by_id.id = distinct_referrals_by_app_id.id);
Here is the result I got:
id | activations_count | amount_earned | referred_at | last_activated_at | id | best_selling_app | best_selling_app_count | best_selling_app_rank
----+-------------------+---------------+-------------+-------------------+----+------------------+------------------------+-----------------------
2 | 3 | 14.0 | 2 | 8 | 2 | b | 2 | 1
1 | 1 | 6.0 | 1 | 5 | 1 | b | 1 | 2
2 | 3 | 14.0 | 2 | 8 | 2 | a | 1 | 2
4 | 1 | 3.0 | 4 | 6 | 4 | c | 1 | 2
The problem with this result is that the table has a duplicate id of 2. I only need unique values for the id column.
I tried a workaround by harnessing distinct that gave desired result but I fear the query results may not be reliable and consistent.
Here is the workaround query:
with agents
as
(select
referrals.id,
referral_id,
amount_earned,
referred_at,
activated_at,
activations.app_id
from referrals
left outer join activations
on (referrals.id = activations.referral_id)
where referrals.user_id_owner = 1),
distinct_referrals_by_id
as
(select
id,
count(referral_id) as activations_count,
sum(coalesce(amount_earned, 0)) as amount_earned,
referred_at,
max(activated_at) as last_activated_at
from
agents
group by id, referred_at),
distinct_referrals_by_app_id
as
(select
distinct on(id), app_id as best_selling_app,
count(app_id) as best_selling_app_count
from agents
group by id, app_id
order by id, best_selling_app_count desc)
select *, dense_rank() over (order by best_selling_app_count desc) best_selling_app_rank
from distinct_referrals_by_id
inner join distinct_referrals_by_app_id
on (distinct_referrals_by_id.id = distinct_referrals_by_app_id.id);
I need a recommendation on how best to achieve this.
I am trying to generate another table from the two tables that has only unique values for referrals.id and returns as one of the columns the count for each apps as best_selling_app_count.
Your question is really complicated with a very complicated SQL query. However, the above is what looks like the actual question. If so, you can use:
select r.*,
a.app_id as most_common_app_id,
a.cnt as most_common_app_id_count
from referrals r left join
(select distinct on (a.referral_id) a.referral_id, a.app_id, count(*) as cnt
from activations a
group by a.referral_id, a.app_id
order by a.referral_id, count(*) desc
) a
on a.referral_id = r.id;
You have not explained the other columns that are in your result set.

Efficient ROW_NUMBER increment when column matches value

I'm trying to find an efficient way to derive the column Expected below from only Id and State. What I want is for the number Expected to increase each time State is 0 (ordered by Id).
+----+-------+----------+
| Id | State | Expected |
+----+-------+----------+
| 1 | 0 | 1 |
| 2 | 1 | 1 |
| 3 | 0 | 2 |
| 4 | 1 | 2 |
| 5 | 4 | 2 |
| 6 | 2 | 2 |
| 7 | 3 | 2 |
| 8 | 0 | 3 |
| 9 | 5 | 3 |
| 10 | 3 | 3 |
| 11 | 1 | 3 |
+----+-------+----------+
I have managed to accomplish this with the following SQL, but the execution time is very poor when the data set is large:
WITH Groups AS
(
SELECT Id, ROW_NUMBER() OVER (ORDER BY Id) AS GroupId FROM tblState WHERE State=0
)
SELECT S.Id, S.[State], S.Expected, G.GroupId FROM tblState S
OUTER APPLY (SELECT TOP 1 GroupId FROM Groups WHERE Groups.Id <= S.Id ORDER BY Id DESC) G
Is there a simpler and more efficient way to produce this result? (In SQL Server 2012 or later)
Just use a cumulative sum:
select s.*,
sum(case when state = 0 then 1 else 0 end) over (order by id) as expected
from tblState s;
Other method uses subquery :
select *,
(select count(*)
from table t1
where t1.id < t.id and state = 0
) as expected
from table t;

Best Hive SQL query for this

i have 2 table something like this. i'm running a hive query and windows function seems pretty limited in hive.
Table dept
id | name |
1 | a |
2 | b |
3 | c |
4 | d |
Table time (build with heavy load query so it's make a very slow process if i need to join to another newly created table time.)
id | date | first | last |
1 | 1992-01-01 | 1 | 1 |
2 | 1993-02-02 | 1 | 2 |
2 | 1993-03-03 | 2 | 1 |
3 | 1993-01-01 | 1 | 3 |
3 | 1994-01-01 | 2 | 2 |
3 | 1995-01-01 | 3 | 1 |
i need to retrieve something like this :
SELECT d.id,d.name,
t.date AS firstdate,
td.date AS lastdate
FROM dbo.dept d LEFT JOIN dbo.time t ON d.id=t.id AND t.first=1
LEFT JOIN time td ON d.id=td.id AND td.last=1
How the most optimized answer ?
GROUP BY operation that will be done in a single map-reduce job
select id
,max(name) as name
,max(case when first = 1 then `date` end) as firstdate
,max(case when last = 1 then `date` end) as lastdate
from (select id
,null as name
,`date`
,first
,last
from time
where first = 1
or last = 1
union all
select id
,name
,null as `date`
,null as first
,null as last
from dept
) t
group by id
;
+----+------+------------+------------+
| id | name | firstdate | lastdate |
+----+------+------------+------------+
| 1 | a | 1992-01-01 | 1992-01-01 |
| 2 | b | 1993-02-02 | 1993-03-03 |
| 3 | c | 1993-01-01 | 1995-01-01 |
| 4 | d | (null) | (null) |
+----+------+------------+------------+
select d.id
,max(d.name) as name
,max(case when t.first = 1 then t.date end) as 'firstdate'
,max(case when t.last = 1 then t.date end) as 'lastdate'
from dept d left join
time t on d.id = t.id
where t.first = 1 or t.last = 1
group by d.id

Microsoft Access query to duplicate ROW_NUMBER

Obviously there are a bunch of questions about ROW_NUMBER in MS Access and the usually response is that it does not exist but instead to use a COUNT(*) to create something similar. Unfortunately, doing so does not give me the results that I need.
My data looks like:
RID | QID
---------
1 | 1
1 | 2
1 | 3
1 | 3
2 | 1
2 | 2
2 | 2
What I am trying to get at is a unique count over RID and QID so that my query output looks like
RID | QID | SeqID
------------------
1 | 1 | 1
1 | 2 | 1
1 | 3 | 1
1 | 3 | 2
2 | 1 | 1
2 | 2 | 1
2 | 2 | 2
Using the COUNT(*) I get:
RID | QID | SeqID
------------------
1 | 1 | 1
1 | 2 | 2
1 | 3 | 3
1 | 3 | 3
2 | 1 | 1
2 | 2 | 2
2 | 2 | 2
My current query is:
SELECT
d.RID
,d.QID
,(SELECT
COUNT(*)
FROM
Data as d2
WHERE
d2.RID = d.RID
AND d2.QID < d.QID) + 1 AS SeqID
FROM
Data as d
ORDER BY
d.RID
,d.QID
Any help would be greatly appreciated.
As Matt's comment implied, the only way to make this work is if you have some column in your table that can uniquely identify each row.
Based on what you have posted, you don't seem to have that. If that's the case, consider adding a new auto increment numeric column that can serve that purpose. Let's pretend that you call that new column id.
With that in place, the following query will work:
select t.rid, t.qid,
(select count(*)
from data t2
where t2.rid = t.rid
and t2.qid = t.qid
and t2.id <= t.id) as SeqID
from data t
order by t.rid, t.qid
SQLFiddle Demo

How to get an empty result table with FULL JOIN?

I will show you the content of some tables.
Bolek=> SELECT id, description from "TOMBInput";
id | description
----+-------------------
1 | Virtual Input 111
2 | Virtual Input 112
3 | Virtual Input 113
4 | Virtual Input 114
(4 rows)
Bolek=> SELECT id, setup_id FROM "TRBTOMBConnection";
id | setup_id
----+----------
1 | 1
2 | 1
3 | 1
4 | 1
(4 rows)
Bolek=> SELECT id, setname FROM "Setup";
id | setname
----+-------------
1 | SETUP_00001
(1 row)
Bolek=> SELECT id, setup_id FROM "Run";
id | setup_id
----+----------
1 | 1
(1 row)
My query [1] is
SELECT
"TOMBInput".id AS tombinput_id,
"TRBTOMBConnection".id AS trbtombconnection_id,
"Setup".id AS setup_id,
"Run".id AS run_id
FROM "TOMBInput"
INNER JOIN "TRBTOMBConnection" ON "TOMBInput".id = "TRBTOMBConnection".tombinput_id
FULL JOIN "Setup" ON "TRBTOMBConnection".id = "Setup".id
FULL JOIN "Run" ON "Setup".id = "Run".id AND "Run".id = 1;
Result table
tombinput_id | trbtombconnection_id | setup_id | run_id
--------------+----------------------+----------+--------
1 | 1 | 1 | 1
2 | 2 | |
3 | 3 | |
4 | 4 | |
(4 rows)
The question is
I would like to have table like
tombinput_id | trbtombconnection_id | setup_id | run_id
--------------+----------------------+----------+--------
1 | 1 | 1 | 1
2 | 2 | 1 | 1
3 | 3 | 1 | 1
4 | 4 | 1 | 1
(4 rows)
because "TRBTOMBConnection" has got 4 rows with setup_id==1
and "Run" has got setup_id==1.
What is more, now when I change last line (in my query [1])
FULL JOIN "Run" ON "Setup".id = "Run".id AND "Run".id = 2;
(in "Run" table we haven`t got id==2) the result of query is
tombinput_id | trbtombconnection_id | setup_id | run_id
--------------+----------------------+----------+--------
1 | 1 | 1 |
2 | 2 | |
3 | 3 | |
4 | 4 | |
| | | 1
(5 rows)
And it`s ok, because I used FULL JOIN.
But in this case when I run my query [1]
I would like to have an empty result table because "Run" hasn't got id==2 and it hasn't got any sense to show table because everything is starting from Run.
How to change my query [1]?
You are confusing IDs:
SELECT
"TOMBInput".id AS tombinput_id,
"TRBTOMBConnection".id AS trbtombconnection_id,
"Setup".id AS setup_id,
"Run".id AS run_id
FROM "TOMBInput"
INNER JOIN "TRBTOMBConnection" ON "TOMBInput".id = "TRBTOMBConnection".tombinput_id
INNER JOIN "Setup" ON "TRBTOMBConnection".setup_id = "Setup".id
INNER JOIN "Run" ON "Setup".id = "Run".setup_id AND "Run".id = 1;
I see no reason for full outer joins here.