I have 2 tables containing different data, linked by a column "id", except the id is repeated multiple times
For example,
Table 1:
id grade
1 A
1 C
Table 2:
Id company
1 Alpha
1 Beta
1 Charlie
The number of rows would be inconsistent, table 1 may sometimes have more/less/equal rows compared to table 2. How am I able to combine/merge them into this outcome:
id grade company
1 A Alpha
1 C Beta
1 Charlie
I am using Microsoft access' query.
This is a real pain in MS Access. But you can do it by using a subquery to generate sequence numbers. Here is one method assuming that the rows are unique:
select id, max(grade) as grade, max(company) as company
from ((select id, grade, null as company,
(select count(*)
from table1 as tt1
where tt1.id = t1.id and tt1.grade <= t1.grade
) as seqnum
from table1 as tt1
) union all
(select id, null as grade, company,
(select count(*)
from table2 as tt2
where tt2.id = t2.id and tt2.company <= t1.company
) as seqnum
from table2 as tt2
)
) t12
group by id, seqnum;
This would be much simpler in almost any other database.
Related
i have the next situation:
i had a table with a unique constraint of serie_id and user_id fields, i deleted it to try something and now i have duplicated rows (ie, two or more rows where the pair user_id AND serie_id are equals)
when trying to see the duplicated rows, i use this
SELECT t1.id
FROM table_A t1
INNER JOIN table_A t2
ON t1.serie_id = t2.serie_id AND t1.user_id = t2.user_id
WHERE t1.id < t2.id
but the table has A LOT of data so it takes too long. Is there a way to optimize it or speed it up?
edited: now im using this query to get all the ids of the duplicated rows,
SELECT id
FROM table_A a
WHERE EXISTS (SELECT 1
FROM table_A b
WHERE a.user_id = b.user_id AND a.serie_id = b.serie_id
HAVING Count(*) > 1)
Order by id desc
it also takes a lot of time, more than half an hour.
Also i want to keep, for each duplicated record, the original one, how can i exclude it from the results of this query?
I cannot use OVER or NUMBER_ROW as i saw in other comments, my version doesn't allow it
Sample Data:
id serie_id user_id
1 100 111
2 100 222
3 100 222
4 58 222
5 100 115
6 100 222
I want to delete the first two rows corresponding to the pair user_id:100 - serie_id=222
so the output would be:
id serie_id user_id
1 100 111
4 58 222
5 100 115
6 100 222
You must define Index for fields that you want to use in inner-join.
And also for fields that you want to use in WHERE clusers.
You can enable "Include Actual Execution plane" in SqlServer Managmenet Studio. SQL suggest you tips for increase performance of queries.
To see the duplicate pair, you could use a query like this:
SELECT t1.serie_id, t1.user_id, COUNT(*) CNT
FROM table_A t1
GROUP BY t1.serie_id, t1.user_id
HAVING COUNT(*) > 1
And to return the actual rows, store the result in a temporary table and join it to the source table, like:
IF OBJECT_ID('tempdb.dbo.#tmp') IS NOT NULL DROP TABLE #tmp
CREATE TABLE #tmp ( serie_id INT, user_id INT, CNT INT)
INSERT INTO #tmp( serie_id, user_id, CNT )
SELECT t1.serie_id, t1.user_id, COUNT(*) CNT
FROM table_A t1
GROUP BY t1.serie_id, t1.user_id
HAVING COUNT(*) > 1
SELECT t1.*,
FROM table_A t1 INNER JOIN #tmp tmp on tmp.serie_id = t1.serie_id and tmp.user_id = t1.user_id
Anyway, an index on the serie_id, user_id columns should help.
age | name | course | score
_________________________
10 |James | Math | 10
10 |James | Lab | 15
12 |Oliver | Math | 15
13 |William | Lab | 13
I want select record where math >= 10 and lab >11
I write this query
select * from mytable
where (course='Math' and score>10) and (course='Lab' and score>11)
but this query does not return any record.
I want this result
age | name
____________
10 |James
where condition (math >= 10 and lab >11) is dynamically generate and perhaps has 2 condition or 100 or more...
please help me
You query looks for records that satisfy both conditions at once - which cannot happen, since each record has a single course.
You want a condition that applies across rows having the same name, so this suggest aggregation instead:
select age, name
from mytable
where course in ('Math', 'Lab')
group by age, name
having
max(case when course = 'Math' then score end) > 10
and max(case when course = 'Lab' then score end) > 11
If you want the names, then use aggregation and a having clause:
select name, age
from mytable
where (course = 'Math' and score > 10) or
(course = 'Lab' and score > 11)
group by name, age
having count(distinct course) = 2;
If you want the detailed records, use window functions:
select t.*
from (select t.*,
(dense_rank() over (partition by name, age order by course asc) +
dense_rank() over (partition by name, age order by course desc)
) as cnt_unique_courses
from mytable t
where (course = 'Math' and score > 10) or
(course = 'Lab' and score > 11)
) t
where cnt_unique_courses = 2;
SQL Server doesn't support count(distinct) as a window function. But you can implement it by using dense_rank() twice.
If you formulate the problem as:
Select all unique (name, age) combinations
That have a row for course Math with a score >= 10
And that have a row for course Lab with a score > 11
Then you can translate this to something very similar in SQL:
select distinct t1.age, t1.name -- unique combinations
from mytable t1
where exists ( select top 1 'x' -- with a row math score >= 10
from mytable t2
where t2.name = t1.name
and t2.age = t1.age
and t2.course = 'math'
and t2.score >= 10 )
and exists ( select top 1 'x' -- with a row lab score > 11
from mytable t3
where t3.name = t1.name
and t3.age = t1.age
and t3.course = 'lab'
and t3.score > 11 );
i think either your data or your condition is not right to get your output. though based on your condition you can separately used your condition and then use Intersect from both selection and get your filtered data. like the code below.
select Age,Name
from Table_1
where Course ='Math' and Score>=10
INTERSECT
select Age,Name
from Table_1
where Course ='Lab' and Score>11
You can write query using co-related subquery
select * from table_1 t1
where score >11 and course ='lab'
and [name] in (select [name] from table_1 t2 where t1.[name] =t2.[name] and t1.age =t2.Age
and t2.Score >=10 and course = 'Math')
This question already has answers here:
How do I find duplicates across multiple columns?
(10 answers)
Closed 4 years ago.
I have the following table:
name email number type
1 abc#example.com 10 A
1 abc#example.com 10 B
2 def#def.com 20 B
3 ggg#ggg.com 30 B
1 abc#example.com 10 A
4 hhh#hhh.com 60 A
I want the following:
Result
name email number type
1 abc#example.com 10 A
1 abc#example.com 10 B
1 abc#example.com 10 A
Basically, I want to find the first lines where the three columns (name, email, number) are identical and see them, regardless of type.
How can I achieve this in SQL? I don't want a result with every combination once, I want to see every line that is in the table multiple times.
I thought of doing a group by but a group by gives me only the unique combinations and every line once. I tried it with a join on the table itself but somehow it got too bloated.
Any ideas?
EDIT: I want to display the type column as well, so group by isn't working and therefore, it's not a duplicate.
You can use exists for that case :
select t.*
from table t
where exists (select 1
from table
where name = t.name and email = t.email and
number = t.number and type <> t.type);
You can also use window function if your DBMS support
select *
from (select *, count(*) over (partition by name, email, number) Counter
from table
) t
where counter > 1;
Core SQL-99 compliant solution.
Have a sub-query that returns name, email, number combinations having duplicates. JOIN with that result:
select t1.*
from tablename t1
join (select name, email, number
from tablename
group by name, email, number
having count(*) > 1) t2
on t1.name = t2.name
and t1.email = t2.email
and t1.number = t2.number
You can use window functions:
select t.*
from (select t.*, count(*) over (partition by name, email, number) as cnt
from t
) t
where cnt > 1;
If you only want combos that have different types (which might be your real problem), I would suggest exists:
select t.*
from t
where exists (select 1
from t t2
where t2.name = t.name and t2.email = t.email and t2.number = t.number and t2.type <> t.type
);
For performance, you want an index on (name, email, number, type) for this version.
Given the SQL table
id date employee_type employee_level
1 10/01/2015 other 2
1 09/13/2011 full-time 1
1 09/25/2010 intern 1
2 09/25/2013 full-time 3
2 09/25/2011 full-time 2
2 09/25/2008 full-time 1
3 09/23/2015 full-time 5
3 09/23/2013 full-time 4
Is it possible to search for ids that have one row with employee_type "intern", and the row above it in the table (same id with later date) with employee_type "full-time".
In this case, id 1 meets my requirement.
Thanks a lot!
Assuming that you mean the same id with the previous date, then you can use lag(), an ANSI standard function supported by most databases:
select t.*
from table t
where t.id in (select id
from (select t.*,
lag(employee_type) over (partition by id order by date) as prev_et
from table t
) tt
where tt.employee_type = 'intern' and tt.prev_et = 'full-time'
);
If your database doesn't support lag(), you can do something similar with correlated subqueries.
I believe the request isn't as described in the question; instead what you appear to be wanting is list all rows for folks who have been interns.
SELECT
t1.*
FROM yourtable AS t1
INNER JOIN (
SELECT DISTINCT
id
FROM yourtable
WHERE employee_type = 'intern'
) AS t2 ON t1.id = t2.id
;
Alternatively you might be wanting only those folks who have been both 'intern' and 'full-time' in which case you could use the query below that uses a HAVING clause:
SELECT
t1.*
FROM yourtable AS t1
INNER JOIN (
SELECT id
FROM yourtable
WHERE employee_type = 'intern'
OR employee_type = 'full-time'
GROUP BY id
HAVING COUNT(DISTINCT employee_type) > 1
) AS t2 ON t1.id = t2.id
;
This is sybase 15.
Here's my problem.
I have 2 tables.
t1.jobid t1.date
------------------------------
1 1/1/2012
2 4/1/2012
3 2/1/2012
4 3/1/2012
t2.jobid t2.userid t2.status
-----------------------------------------------
1 100 1
1 110 1
1 120 2
1 130 1
2 100 1
2 130 2
3 100 1
3 110 1
3 120 1
3 130 1
4 110 2
4 120 2
I want to find all the people who's status for THEIR two most recent jobs is 2.
My plan was to take the top 2 of a derived table that joined t1 and t2 and was ordered by date backwards for a given user. So the top two would be the most recent for a given user.
So that would give me that individuals most recent job numbers. Not everybody is in every job.
Then I was going to make an outer query that joined against the derived table searching for status 2's with a having a sum(status) = 4 or something like that. That would find the people with 2 status 2s.
But sybase won't let me use an order by clause in the derived table.
Any suggestions on how to go about this?
I can always write a little program to loop through all the users, but I was gonna try to make one horrendus sql out of it.
Juicy one, no?
You could rank the rows in the subquery by adding an extra column using a window function. Then select the rows that have the appropriate ranks within their groups.
I've never used Sybase, but the documentation seems to indicate that this is possible.
With Table1 As
(
Select 1 As jobid, '1/1/2012' As [date]
Union All Select 2, '4/1/2012'
Union All Select 3, '2/1/2012'
Union All Select 4, '3/1/2012'
)
, Table2 As
(
Select 1 jobid, 100 As userid, 1 as status
Union All Select 1,110,1
Union All Select 1,120,2
Union All Select 1,130,1
Union All Select 2,100,1
Union All Select 2,130,2
Union All Select 3,100,1
Union All Select 3,110,1
Union All Select 3,120,1
Union All Select 3,130,1
Union All Select 4,110,2
Union All Select 4,120,2
)
, MostRecentJobs As
(
Select T1.jobid, T1.date, T2.userid, T2.status
, Row_Number() Over ( Partition By T2.userid Order By T1.date Desc ) As JobCnt
From Table1 As T1
Join Table2 As T2
On T2.jobid = T1.jobid
)
Select *
From MostRecentJobs As M2
Where Not Exists (
Select 1
From MostRecentJobs As M1
Where M1.userid = M2.userid
And M1.JobCnt <= 2
And M1.status <> 2
)
And M2.JobCnt <= 2
I'm using a number of features here which do exist in Sybase 15. First, I'm using common-table expressions both for my sample data and clump my queries together. Second, I'm using the ranking function Row_Number to order the jobs by date.
It should be noted that in the example data you gave, no user satisfies the requirement of having their two most recent jobs both be of status "2".
__
Edit
If you are using a version of Sybase that does not support ranking functions (e.g. Sybase 15 prior to 15.2), then you need simulate the ranking function using Counts.
Create Table #JobRnks
(
jobid int not null
, userid int not null
, status int not null
, [date] datetime not null
, JobCnt int not null
, Primary Key ( jobid, userid, [date] )
)
Insert #JobRnks( jobid, userid, status, [date], JobCnt )
Select T1.jobid, T1.userid, T1.status, T1.[date], Count(T2.jobid)+ 1 As JobCnt
From (
Select T1.jobid, T2.userid, T2.status, T1.[date]
From #Table2 As T2
Join #Table1 As T1
On T1.jobid = T2.jobid
) As T1
Left Join (
Select T1.jobid, T2.userid, T2.status, T1.[date]
From #Table2 As T2
Join #Table1 As T1
On T1.jobid = T2.jobid
) As T2
On T2.userid = T1.userid
And T2.[date] < T1.[date]
Group By T1.jobid, T1.userid, T1.status, T1.[date]
Select *
From #JobRnks As J1
Where Not Exists (
Select 1
From #JobRnks As J2
Where J2.userid = J1.userid
And J2.JobCnt <= 2
And J2.status <> 2
)
And J1.JobCnt <= 2
The reason for using the temp table here is for performance and ease of reading. Technically, you could plug in the query for the temp table into the two places used as a derived table and achieve the same result.