T-SQL Subquery Question - sql

i have two queries.
For each tuple of query1 i want to run query2. i dont want to use cursors. i tried several approaches using subqueries.
query1:
select
distinct
category,
Count(category) as CategoryCount
from
mytable
group by
category
query2:
select
top 3
Text,
Title,
Category
from
mytable
where
Category = '1'
Category = '1' is a sample. the value should come from query1

Try this
WITH TBL AS
(
SELECT TEXT, TITLE, CATEGORY,
COUNT(*) OVER(PARTITION BY CATEGORY) AS CATEGORYCOUNT,
ROW_NUMBER() OVER(PARTITION BY CATEGORY ORDER BY (SELECT 0)) AS RC
FROM MYTABLE
)
SELECT TEXT, TITLE, CATEGORY, CATEGORYCOUNT
FROM TBL
WHERE RC <= 3
ORDER BY CATEGORY

Related

How to get max value and using group by clause

I have a query like this:
select transactions_id,
time_stamp,
clock
from times
group by transactions_id
having sum(distinct type) = 1
now, I would like to get max value depending on id.
I used below queries but not worked:
select max(id),
transactions_id,
time_stamp,
clock
from times
group by transactions_id
having sum(distinct type) = 1
or
select transactions_id,
time_stamp,
clock
from times
group by transactions_id
having sum(distinct type) = 1
and max(id)
for example:
I have three conditions:
type must be 1
group by transactions_id
max id
You can find aggregates in one query and join its result with the table to get the relevant rows.
select *
from times t1
join (
select transactions_id,
max(id) as id
from times
where type = 1
group by transactions_id
) t2 using (transactions_id, id);
If I understand correctly, you can use the ANSI standard row_number() function:
select t.*
from (select t.*,
row_number() over (partition by transactions_id order by id desc) as seqnum
from times t
) t
where seqnum = 1;
I am not sure what having sum(distinct type) = 1. That condition is not explained in the question.

How to select the latter row in SQL

I have a result set that looks like this:
As you can see some of the contactID are repeated with same QuestionResponse. And there is one with a different QuestionResponse (the one with red lines).
I want to group this by ContactID, but select the latter row. Eg: In case of ContactID = 78100299, I want to select the row with CreateDate = 17:00:44.907 (or rowNum = 2).
I have tried this:
select
ContactID,
max(QuestionResponse) as QuestionResponse,
max(CreateDate) as CreateDate
from
theResultSet
group by
ContactID
This will NOT work because there could be QuestionResponse 2 and then 1 for the same contactID. In that case the latter one will be the one with response 1 not 2.
Thank you for you help.
I would use ROW_NUMBER() that way:
WITH Query AS
(
SELECT rowNum, ContactID, QuestionResponse, CreateDate,
ROW_NUMBER() OVER (PARTITION BY ContactID ORDER BY CreateDate DESC) Ordered
FROM theResultSet
)
SELECT * FROM Query WHERE Ordered=1
Assign numbers in ContactID group by date, descending
Filter results having number <> 1
This might work if your SQL Engine can handle it...
SELECT trs1.*
FROM theResultSet trs1
INNER JOIN
(SELECT ContactID, max(CreateDate) as CreateDate
FROM theResultSet
GROUP BY ContactID) trs2
ON trs1.ContactID = trs2.ContactID
AND trs1.CreateDate = trs2.CreateDate
The end result will be all rows from theResultSet where the creation date is the max creation date.
This should work too:
SELECT
ContactID, QuestionResponse,CreateDate
FROM (
select rowNum, ContactID, QuestionResponse,CreateDate,
max(rowNum) over(partition by ContactID) as maxrow
from theResultSet
) x
WHERE rowNum=maxrow

SQL for counting rows and categorize

Is it possible to do the following for count >=3,4,5,6,7,8 etc.
rather than repeating the entire code for each count category
Insert into OnePlus (SELECT DISTINCT Id, Name, COUNT(DISTINCT StartDate) AS OnePlusDays
FROM DataTable
HAVING OnePlusDays >= 1
GROUP BY Id, Name)
Insert into TwoPlus (SELECT DISTINCT Id, Name, COUNT(DISTINCT StartDate) AS TwoPlusDays
FROM DataTable
HAVING TwoPlusDays >= 2
GROUP BY Id, Name)
Finally
SELECT Id, Name, "1+" AS Categories
FROM OnePlus
UNION
SELECT Id, Name, "2+" AS Categories
FROM TwoPlus
You mention only sql in the tags. Depending on MySql or SQL Server, you may need to change the Cast/Convert and Concatenation. But this query may help. You really don't need to put a Distinct on top a group by, the fact that you are grouping by, means only distinct values and their counts will be fetched.
Of course, the table OnePlus, is really what you call Categories.
Insert into OnePlus
SELECT Id, Name, convert(varchar(10), COUNT(DISTINCT StartDate) ) + "+" AS Categories
FROM DataTable
GROUP BY Id, Name
In T-SQL you can write as:
SELECT Id,
NAME , -- make sure you write case statement in desc order
CASE WHEN PlusDays > = 2 THEN '2+'
WHEN PlusDays > = 1 THEN '1+' END AS Categories
FROM
(
SELECT DISTINCT Id, Name, COUNT(DISTINCT StartDate) PlusDays
FROM #DataTable
GROUP BY Id, Name
) AS T
ORDER BY Id asc

How to find duplicate records in PostgreSQL

I have a PostgreSQL database table called "user_links" which currently allows the following duplicate fields:
year, user_id, sid, cid
The unique constraint is currently the first field called "id", however I am now looking to add a constraint to make sure the year, user_id, sid and cid are all unique but I cannot apply the constraint because duplicate values already exist which violate this constraint.
Is there a way to find all duplicates?
The basic idea will be using a nested query with count aggregation:
select * from yourTable ou
where (select count(*) from yourTable inr
where inr.sid = ou.sid) > 1
You can adjust the where clause in the inner query to narrow the search.
There is another good solution for that mentioned in the comments, (but not everyone reads them):
select Column1, Column2, count(*)
from yourTable
group by Column1, Column2
HAVING count(*) > 1
Or shorter:
SELECT (yourTable.*)::text, count(*)
FROM yourTable
GROUP BY yourTable.*
HAVING count(*) > 1
From "Find duplicate rows with PostgreSQL" here's smart solution:
select * from (
SELECT id,
ROW_NUMBER() OVER(PARTITION BY column1, column2 ORDER BY id asc) AS Row
FROM tbl
) dups
where
dups.Row > 1
In order to make it easier I assume that you wish to apply a unique constraint only for column year and the primary key is a column named id.
In order to find duplicate values you should run,
SELECT year, COUNT(id)
FROM YOUR_TABLE
GROUP BY year
HAVING COUNT(id) > 1
ORDER BY COUNT(id);
Using the sql statement above you get a table which contains all the duplicate years in your table. In order to delete all the duplicates except of the the latest duplicate entry you should use the above sql statement.
DELETE
FROM YOUR_TABLE A USING YOUR_TABLE_AGAIN B
WHERE A.year=B.year AND A.id<B.id;
You can join to the same table on the fields that would be duplicated and then anti-join on the id field. Select the id field from the first table alias (tn1) and then use the array_agg function on the id field of the second table alias. Finally, for the array_agg function to work properly, you will group the results by the tn1.id field. This will produce a result set that contains the the id of a record and an array of all the id's that fit the join conditions.
select tn1.id,
array_agg(tn2.id) as duplicate_entries,
from table_name tn1 join table_name tn2 on
tn1.year = tn2.year
and tn1.sid = tn2.sid
and tn1.user_id = tn2.user_id
and tn1.cid = tn2.cid
and tn1.id <> tn2.id
group by tn1.id;
Obviously, id's that will be in the duplicate_entries array for one id, will also have their own entries in the result set. You will have to use this result set to decide which id you want to become the source of 'truth.' The one record that shouldn't get deleted. Maybe you could do something like this:
with dupe_set as (
select tn1.id,
array_agg(tn2.id) as duplicate_entries,
from table_name tn1 join table_name tn2 on
tn1.year = tn2.year
and tn1.sid = tn2.sid
and tn1.user_id = tn2.user_id
and tn1.cid = tn2.cid
and tn1.id <> tn2.id
group by tn1.id
order by tn1.id asc)
select ds.id from dupe_set ds where not exists
(select de from unnest(ds.duplicate_entries) as de where de < ds.id)
Selects the lowest number ID's that have duplicates (assuming the ID is increasing int PK). These would be the ID's that you would keep around.
Inspired by Sandro Wiggers, I did something similiar to
WITH ordered AS (
SELECT id,year, user_id, sid, cid,
rank() OVER (PARTITION BY year, user_id, sid, cid ORDER BY id) AS rnk
FROM user_links
),
to_delete AS (
SELECT id
FROM ordered
WHERE rnk > 1
)
DELETE
FROM user_links
USING to_delete
WHERE user_link.id = to_delete.id;
If you want to test it, change it slightly:
WITH ordered AS (
SELECT id,year, user_id, sid, cid,
rank() OVER (PARTITION BY year, user_id, sid, cid ORDER BY id) AS rnk
FROM user_links
),
to_delete AS (
SELECT id,year,user_id,sid, cid
FROM ordered
WHERE rnk > 1
)
SELECT * FROM to_delete;
This will give an overview of what is going to be deleted (there is no problem to keep year,user_id,sid,cid in the to_delete query when running the deletion, but then they are not needed)
In your case, because of the constraint you need to delete the duplicated records.
Find the duplicated rows
Organize them by created_at date - in this case I'm keeping the oldest
Delete the records with USING to filter the right rows
WITH duplicated AS (
SELECT id,
count(*)
FROM products
GROUP BY id
HAVING count(*) > 1),
ordered AS (
SELECT p.id,
created_at,
rank() OVER (partition BY p.id ORDER BY p.created_at) AS rnk
FROM products o
JOIN duplicated d ON d.id = p.id ),
products_to_delete AS (
SELECT id,
created_at
FROM ordered
WHERE rnk = 2
)
DELETE
FROM products
USING products_to_delete
WHERE products.id = products_to_delete.id
AND products.created_at = products_to_delete.created_at;
Following SQL syntax provides better performance while checking for duplicate rows.
SELECT id, count(id)
FROM table1
GROUP BY id
HAVING count(id) > 1
begin;
create table user_links(id serial,year bigint, user_id bigint, sid bigint, cid bigint);
insert into user_links(year, user_id, sid, cid) values (null,null,null,null),
(null,null,null,null), (null,null,null,null),
(1,2,3,4), (1,2,3,4),
(1,2,3,4),(1,1,3,8),
(1,1,3,9),
(1,null,null,null),(1,null,null,null);
commit;
set operation with distinct and except.
(select id, year, user_id, sid, cid from user_links order by 1)
except
select distinct on (year, user_id, sid, cid) id, year, user_id, sid, cid
from user_links order by 1;
except all also works. Since id serial make all rows unique.
(select id, year, user_id, sid, cid from user_links order by 1)
except all
select distinct on (year, user_id, sid, cid)
id, year, user_id, sid, cid from user_links order by 1;
So far works nulls and non-nulls.
delete:
with a as(
(select id, year, user_id, sid, cid from user_links order by 1)
except all
select distinct on (year, user_id, sid, cid)
id, year, user_id, sid, cid from user_links order by 1)
delete from user_links using a where user_links.id = a.id returning *;

How to select records with minimum price per group

I'd like to select each pair of two columns in a database, but only select the entry with the lowest price. As a result, I want to output the id and the price column.
But it does not work:
My table:
id | category | type | name | price
1;"car";"pkw";"honda";1000.00
2;"car";"pkw";"bmw";2000.00
SQL:
select min(price) price, id
from cartable
group by category, type
Result:
Column "cartable.id" must be present in GROUP-BY clause or used in an aggregate function.
If you want the entry with the lowest price, then calculate the lowest price and join the information back in:
select ct.*
from cartable ct join
(select category, type, min(price) as price
from cartable
group by category, type
) ctp
on ct.category = ctp.category and ct.type = ctp.type and ct.price = ctp.price;
You can achieve this with EXISTS clause:
SELECT *
FROM cartable ct
WHERE
NOT EXISTS (
SELECT *
FROM cartable
WHERE ct.type = type and ct.category = categoery and ct.price < price)
For speed caparison can you try this:
SELECT DISTINCT ON (type, category), id, price
FROM cartable
ORDER BY price DESC
SELECT id, price
from cartable C
inner join
(
select min(price) as price , category, type
from cartable
group by category, type
)T
on T.category = C.category
and T.type = C.type
Most of the time you can't do much else than resolve to use Select - Over
select price, id
from(
select price, id, [rnk] = ROW_NUMBER() over( partition by category, type order by price)
from cartable
) as a
where [rnk]=1
Create index appropriately and performance are good.
In your example something like this:
CREATE NONCLUSTERED INDEX [foo]
ON [dbo].[cartable] ([category],[type])
INCLUDE ([price])
Maybe you can try:
select id, price from cartable
where price = (select min(price) from cartable);