SQL Select Distinct with Conditional - sql

Table1 has columns (id, a, b, c, group). There are several rows that have the same group, but id is always unique. I would like to SELECT group,a,b FROM Table1 WHERE the group is distinct. However, I would like the returned data to be from the row with the greatest id for that group.
Thus, if we have the rows
(id=10, a=6, b=40, c=3, group=14)
(id=5, a=21, b=45, c=31, group=230)
(id=4, a=42, b=65, c=2, group=230)
I would like to return these 2 rows:
[group=14, a=6,b=40] and
[group=230, a=21,b=45] (because id=5 > id=4)
Is there a simple SELECT statement to do this?

Try:
select grp, a, b
from table1 where id in
(select max(id) from table1 group by grp)

You can do it using a self join or an inner-select. Here's inner select:
select `group`, a, b from Table1 AS T1
where id=(select max(id) from Table1 AS T2 where T1.`group` = T2.`group`)
And self-join method:
select T1.`group`, T2.a, T2.b from
(select max(id) as id,`group` from Table1 group by `group`) T1
join Table1 as T2 on T1.id=T2.id

2 selects, your inner select gets:
SELECT MAX(id) FROM YourTable GROUP BY [GROUP]
Your outer select joins to this table.
Think about it logically, the inner select gets a sub set of the data you need.
The outer select inner joins to this subset and can get further data.
SELECT [group], a, b FROM YourTable INNER JOIN
(SELECT MAX(id) FROM YourTable GROUP BY [GROUP]) t
ON t.id = YourTable.id

SELECT mi.*
FROM (
SELECT DISTINCT grouper
FROM mytable
) md
JOIN mytable mi
ON mi.id =
(
SELECT id
FROM mytable mo
WHERE mo.grouper = md.grouper
ORDER BY
id DESC
LIMIT 1
)
If your table is MyISAM or id is not a PRIMARY KEY, then make sure you have a composite index on (grouper, id).
If your table is InnoDB and id is a PRIMARY KEY, then a simple index on grouper will suffice (id, being a PRIMARY KEY, will be implictly included).
This will use an INDEX FOR GROUP-BY to build the list of distinct groupers, and for each grouper it will use the index access to find the maximal id.

Don't know how to do it in mysql. But the following code will work for MsSQL...
SELECT Y.* FROM
(
SELECT DISTINCT [group], MAX(id) ID
FROM Table1
GROUP BY [group]
) X
INNER JOIN Table1 Y ON X.ID=Table1.ID

Related

oracle12c,sql,difference between count(*) and sum()

Tell me the difference between sql1 and sql2:
sql1:
select count(1)
from table_1 a
inner join table_2 b on a.key = b.key where a.id in (
select id from table_1 group by id having count(1) > 1
)
sql2:
select sum(a) from (
select count(1) as a
from table_1 a
inner join table_2 b on a.key = b.key group by a.id having count(1) > 1
)
Why is the output not the same?
The queries are not even similar. They are very different. Let's check the first one:
select count(1)
from table_1 a
inner join table_2 b
on a.key = b.key
where a.id in (
select id from table_1 group by id having count(1) > 1
) ;
You are first making an inner join:
select count(1)
from table_1 a
inner join table_2 b
on a.key = b.key
In this case, you can use count(1), count(id), count(*), it's equivalent. You are counting the common elements in both tables: those ones that have in common the key field.
After that, you are enforcing this:
where a.id in (
select id from table_1 group by id having count(1) > 1
)
In other words, that every "id" of the table_1 must be at least two times in the table_1 table.
And lastly, you are doing this:
select count(1)
In other words, counting those elements. So, translated into english you have done this:
get every record of table_1 and pair with records of table_2 for the id, and get only those that match
for the result above, filter out only the elements whose id of the table_1 appears more than one time
count that result
Let's see what happens with the second query:
select sum(a) from (
select count(1) as a
from table_1 a
inner join table_2 b
on a.key = b.key
group by a.id
having count(1) > 1
);
You are making the same inner join:
select count(1) as a
from table_1 a
inner join table_2 b
on a.key = b.key
but, you are grouping it by the id of the table:
group by a.id
and then filtering out only those elements who appear more than one time:
having count(1) > 1
The result so far are a set of records that have in common the key field in both tables, but grouped by the id: this means that only those fields that are at leas two times in the table_b are outputed of this join. After that, you group by id, collapsing those results into the table_1.id field and counting the result. I presume that very few records will match this strict criteria.
And lastly, you sum all those set.
When you use count(*) you count ALL the rows. The SUM() function is an aggregate function that returns the sum of all or distinct values in a set of values.

SQL Case Condition On Inner Join

I am currently trying to join a table to itself to check if for one email there exist two or more Ids.
I am trying to join my table with itself on its email. I then wanted to query my table with a case condition saying if the count of the email in the nested query > 1 then select the latest modified record in the outer table.
SELECT *
FROM table1 <-- outer table
WHERE email IN
(SELECT email, COUNT(*)
FROM table1 as src
INNER JOIN table1 ON src.Email = table1.Email AND src.Id = table1.id
GROUP BY src.Email)
How can I write a query to say if the count for the given email is greater than 1 then select the latest record from the outer table?
Why would you go through all that trouble? How about just selecting the last modified record:
select t1.*
from table1 t1
where t1.modified_dt = (select max(tt1.modified_dt)
from table1 tt1
where tt1.email = t1.email
);
Another way to do it using window functions:
DECLARE #Tab TABLE (ID INT, Email VARCHAR(100), LastModified DATE)
INSERT #Tab
VALUES (1,'testemail#none.com','2019-12-01'),
(2,'testemail#none.com','2019-11-19'),
(3,'otheremail#none.com','2019-12-15')
SELECT *
FROM(
SELECT ROW_NUMBER() OVER(PARTITION BY t.Email ORDER BY t.LastModified DESC) rn, t.*
FROM #Tab t
) t2
WHERE t2.rn = 1
If by latest you mean the latest id number (the maximum number) then this should help you
With cte AS
(
SELECT email,
COUNT(id) OVER (PARTITION BY email) AS CountOfIDs,
ROW_NUMBER() OVER (PARITION BY email ORDER BY ID DESC) AS IdIndex
FROM table1
)
SELECT *
FROM cte
WHERE CountOfIDs > 1 AND IdIndex = 1

PostgreSQL: Error in left join

I am trying to join my master table to some sub-tables in PostgreSQL in a single select query. I am getting a syntax error and I have the feeling I am making a terrible mistake or doing something which is not allowed. The code:
Select
id,
length,
other_stuff
from my_table tbl1
Left join
(
Select
id,
height
from my_table2 tbl2) tbl2 using (id)
left join
-- I get syntax error here
(
With a as (select id from some_table),
b as (Select value from other_table)
Select id, value from a, b) tbl3 using (id)
order by tbl1.id
Can we use WITH clause in left joins sub or nested queries and Is there a better way to do this?
UPDATE1
Well, I would like to add some more details. I have three select queries like this (having unique ID) and I want to join them based on ID.
Query1:
With a as (Select id, my_other records... from postgres_table1)
b as (select id, my_records... from postgres_table2)
c as (select id, my_record.. from postgres_table3, b)
Select
id,
my_records
from a left join c on some_condtion_with_a
order by 1
Second query:
Select
id, my_records
from
(
multiple_sub_queries_by_getting_records_from_c
)
Third Query:
With d as (select id, records.. from b),
e as (select id, records.. from d),
f as (select id, records.. from e)
select
id,
records..
from f
I tried to join them using left join. The first two queries were joined successfully. While, joining third query I got the syntax error. Maybe, I am complicating things thus I asked is there a better way to do it.
You are over complicating things. There is no need to use a derived table to outer join my_table2. And there is no need for a CTE plus a derived table to join the tbl3 alias:
Select id,
length,
other_stuff
from my_table tbl1
Left join my_table2 tbl2 using (id)
left join (
select st.id, ot.value
from some_table st
cross join other_table ot
) tbl3 using (id)
order by tbl1.id;
This assumes that the cross join you create with Select id, value from a, b is intended.
Not tested, but I think you need this. try:
with a as (select id from some_table),
b as (Select value from other_table)
Select
id,
length,
other_stuff
from my_table tbl1
Left join
(
Select
id,
height
from my_table2 tbl2
)
tbl2 using (id)
left join
(
Select id, value from a, b
)
tbl3 using (id)
order by tbl1.id
I've only ever seen/used WITH in the following format:
WITH
temptablename(columns) as (query),
temptablename2(columns) as (query),
...
temptablenameX(columns) as (query)
SELECT ...
i.e. they come first
You'll probably find it easier to write queries if you use indentation to describe nesting levels. I like to make my SELECT FROM WHERE GROUPBY ORDERBY at one indent level, and then tablename INNER JOIN ON etc more indented:
SELECT
column
FROM
table
INNER JOIN
(
SELECT subcolumn FROM subtable WHERE subclause
) myalias
ON
table.id = myalias.whatever
WHERE
blah
Organising your indents every time you nest down a layer really helps. By making everything that is "a table or a block of data like a table (i.e. a subquery)" indented the same amount you can easily see the notional order that the DB should retrieve
Move your WITHs to the top of the statement, you will still use the alias names in place in the sub sub query of course
Looking at your query, there isn't much point in your subqueries.. You don't do any grouping or particularly complex processing of the data, you just select an ID and another column and then join it in. Your query will be simpler if you don't do this:
SELECT
column
FROM
table
INNER JOIN
(
SELECT subcolumn FROM subtable WHERE subclause
) myalias
ON
table.id = myalias.whatever
WHERE
blah
Instead, do this:
SELECT
column
FROM
table
INNER JOIN
subtable
ON
table.id = subtable.id
WHERE
blah
Re your updated requirements, following the same pattern.
look for --my comments
With a as (Select id, my_other records... from postgres_table1)
b as (select id, my_records... from postgres_table2)
c as (select id, my_record.. from postgres_table3, b)
d as (select id, records.. from b),
e as (select id, records.. from d),
f as (select id, records.. from e)
SELECT * FROM
(
--your first
Select
id,
my_records
from a left join c on some_condtion_with_a
) Q1
LEFT OUTER JOIN
(
--your second
Select
id, my_records
from
(
multiple_sub_queries_by_getting_records_from_c
)
) Q2
ON Q1.XXXX = Q2.XXXX --fill this in !!!!!!!!!!!!!!!!!!!
LEFT OUTER JOIN
(
--your third
select
id,
records..
from f
) Q3
ON QX.XXXXX = Q3.XXXX --fill this in !!!!!!!!!!!!!!!!!!!
It'll work, but it might not be the prettiest or most necessary SQL arrangement. As both i and HWNN have said, you can rewrite a lot of these queries where you're just doing some simple selecting in your WITH.. But likely that theyre simple enough that the database optimizer can also see this and rerwite the query for you when it runs it
Just remember to code clearly, and lay your indentation out nicely to stop it tunring into a massive, unmaintainable, undebuggable spaghetti mess

getting top row of joined table

I have 2 tables, tableA and tableB
tableA - id int
name varchar(50)
tableB - id int
fkid int
name varchar(50)
Both tables are joined between id and fkid.
Below are sample rows from tableA
Below is output from tableB
I want to join both tables and get only top row of joined table. So output will be like below
Id Name fkid
1 P1 1
2 P2 4
3 P3 null
Here is Sql fiddle
How can i achieve this with single query? I know that i can loop through in my .net code and retrieve top rows. But i want it in single query.
select a.id,a.name,b.fid from tableA a left join
(
select min(id) fid ,fkid from tableB group by fkid
)b
on a.id = b.fkid
select ta.id, ta.name, min(tb.id) from tableA ta
left join tableB tb on tb.fkid=ta.id
group by ta.id, ta.name
You could do this:
;WITH CTE
AS
(
SELECT
ROW_NUMBER() OVER(PARTITION BY fkID ORDER BY ID) AS RowNbr,
tableB.*
FROM
tableB
)
SELECT
*
FROM
tableA
LEFT JOIN CTE
ON CTE.fkID=tableA.id
AND CTE.RowNbr=1
Demo here
Or without window function. Like this:
SELECT
*
FROM
tableA
LEFT JOIN
(
SELECT
ROW_NUMBER() OVER(PARTITION BY fkID ORDER BY ID) AS RowNbr,
tableB.*
FROM
tableB
) as tbl
ON tbl.fkID=tableA.id
AND tbl.RowNbr=1
Demo here
Update:
The reason why I choose to do it with row_number is that if there is more columns in tableB then the example. Then there is no need for additional aggregate if you want to show more columns. For me personally it is more clear with an order by on the ID

Select rows having dstinct values for two fields

Pardon me for the title. I have a table like this:
There will be thousands of rows and now I want to select the rows having the same group_id but vr_debit and vr_credit values must not be equal: ie;, in the image shown, none of the rows satisfy this criteria. If there is are two rows, say, (6,500.000,0) and(6,0,600.000), I want them as the result. Hope you get the idea.
Thank you.
Calculate each group using SUM() which is an aggregate function and filter them using HAVING clause.
SELECT GROUP_ID, SUM(vr_debit) totalDebit, SUM(vr_credit) totalCredit
FROM TableName
GROUP BY GROUP_ID
HAVING SUM(vr_debit) <> SUM(vr_credit)
if you want to get the uncalculated rows, you can join it on the subquery.
SELECT a.*
FROM TableName a
INNER JOIN
(
SELECT GROUP_ID
FROM TableName
GROUP BY GROUP_ID
HAVING SUM(vr_debit) <> SUM(vr_credit)
) b ON a.GROUP_ID = b.GROUP_ID
SQLFiddle Demo (for both queries)
Perhaps:
SELECT group_ID,
vr_debit,
vr_credit
FROM
dbo.TableName T1
WHERE
EXISTS(
SELECT 1 FROM dbo.TableName T2
WHERE T1.group_ID = T2.group_ID
AND T1.vr_debit <> T2.vr_debit
AND T1.vr_credit<> T2.vr_credit
AND T1.vr_debit <> T2.vr_credit
)
Also you can use this option
SELECT *
FROM dbo.test64 t
WHERE EXISTS (
SELECT 1
FROM dbo.test64 t2
WHERE t.group_id = t2.group_id
HAVING SUM(t2.vr_debit) - SUM(t2.vr_credit) != 0
)
Demo on SQLFiddle