Get adjacent fields with GROUP BY / MAX - sql

This has probably been asked many times but I can't find a solution because I don't know how to phrase this question.
[product] [shop] [price] [date]
Pizza Shop1 10 2014-05-10
Pizza Shop2 12 2014-05-04
Snow Shop1 101 2014-05-02
Snow Shop3 93 2014-05-11
I wish to query this table and get the price of the last added product:
[product] [shop] [price] [date]
Pizza Shop1 10 2014-05-10
Snow Shop3 93 2014-05-11
An obviously wrong syntax:
SELECT
product,
shop WHERE MAX(date),
price WHERE MAX(date),
MAX(date)
FROM myTable
GROUP BY product
This query is already a part of a subquery so I want the best possible performing solution.

Using ROW_NUMBER() you can split the records in to partitions for each product, and assign each a row number, starting with 1 for the newest record for a partition (product).
Then you just select all the records with a row number of 1, for each product.
WITH
sequenced AS
(
SELECT
myTable.*,
ROW_NUMBER() OVER (PARTITION BY [product] ORDER BY [date] DESC) AS product_date_ordinal
FROM
myTable
)
SELECT
*
FROM
sequenced
WHERE
product_date_ordinal = 1
This assumes SQL SERVER 2005 onwards, and you should have an index on product, date DESC for best performance.

You can try this aswell. Using join with subquery to filter your table, not as good as first answer I guess, but easier to read and understand:
SELECT Table.*
FROM Table
RIGHT JOIN
(
SELECT product, max(date) AS date FROM Table
GROUP BY product
) AS Filter
ON Filter.product = Table.product AND Filter.date = Table.date
Just replace "Table" with the name of your table.

Related

Is it possible to create and use window function in the same query?

I'm using PostgreSQL and I have the following situation:
table of Sales (short version):
itemid quantity
5 10
5 12
6 1
table of stock (short version):
itemid stock
5 30
6 1
I have a complex query that also needs to present in one of it's columns the SUM of each itemid.
So it's going to be:
Select other things,itemid,stock, SUM (quantity) OVER (PARTITION BY itemid) AS total_sales
from .....
sales
stock
This query is OK. however this query will present:
itemid stock total_sales
5 30 22
6 1 1
But I don't need to see itemid=6 because the whole stock was sold. meaning that I need a WHERE condition like:
WHERE total_sales<stock
but I can't do that as the total_sales is created after the WHERE is done.
Is there a way to solve this without surrounding the whole query with another one? I'm trying to avoid it if I can.
You can use a subquery or CTE:
select s.*
from (Select other things,itemid,stock,
SUM(quantity) OVER (PARTITION BY itemid) AS total_sales
from .....
) s
where total_sales < stock;
You cannot use table aliases defined in a SELECT in the SELECT, WHERE, or FROM clauses for that SELECT. However, a subquery or CTE gets around this restriction.
You can also use an inner select in your WHERE statement like this:
SELECT *, SUM (quantity) OVER (PARTITION BY itemid) AS total_sales
FROM t
WHERE quantity <> (SELECT SUM(quantity) FROM t ti WHERE t.itemid = ti.itemid);

Finding the first occurrence of an element in a SQL database

I have a table with a column for customer names, a column for purchase amount, and a column for the date of the purchase. Is there an easy way I can find how much first time customers spent on each day?
So I have
Name | Purchase Amount | Date
Joe 10 9/1/2014
Tom 27 9/1/2014
Dave 36 9/1/2014
Tom 7 9/2/2014
Diane 10 9/3/2014
Larry 12 9/3/2014
Dave 14 9/5/2014
Jerry 16 9/6/2014
And I would like something like
Date | Total first Time Purchase
9/1/2014 73
9/3/2014 22
9/6/2014 16
Can anyone help me out with this?
The following is standard SQL and works on nearly all DBMS
select date,
sum(purchaseamount) as total_first_time_purchase
from (
select date,
purchaseamount,
row_number() over (partition by name order by date) as rn
from the_table
) t
where rn = 1
group by date;
The derived table (the inner select) selects all "first time" purchases and the outside the aggregates based on the date.
The two key concepts here are aggregates and sub-queries, and the details of which dbms you're using may change the exact implementation, but the basic concept is the same.
For each name, determine they're first date
Using the results of 1, find each person's first day purchase amount
Using the results of 2, sum the amounts for each date
In SQL Server, it could look like this:
select Date, [totalFirstTimePurchases] = sum(PurchaseAmount)
from (
select t.Date, t.PurchaseAmount, t.Name
from table1 t
join (
select Name, [firstDate] = min(Date)
from table1
group by Name
) f on t.Name=f.Name and t.Date=f.firstDate
) ftp
group by Date
If you are using SQL Server you can accomplish this with either sub-queries or CTEs (Common Table Expressions). Since there is already an answer with sub-queries, here is the CTE version.
First the following will identify each row where there is a first time purchase and then get the sum of those values grouped by date:
;WITH cte
AS (
SELECT [Name]
,PurchaseAmount
,[date]
,ROW_NUMBER() OVER (
PARTITION BY [Name] ORDER BY [date] --start at 1 for each name at the earliest date and count up, reset every time the name changes
) AS rn
FROM yourTableName
)
SELECT [date]
,sum(PurchaseAmount) AS TotalFirstTimePurchases
FROM cte
WHERE rn = 1
GROUP BY [date]

SQL - Only one result per set

I have a SQL problem (MS SQL Server 2012), where I only want one result per set, but have different items in some rows, so a group by doesn't work.
Here is the statement:
Select Deliverer, ItemNumber, min(Price)
From MyTable
Group By Deliverer, ItemNumber
So I want the deliverer with the lowest price for one item.
With this query I get the lowest price for each deliverer.
So a result like:
DelA 12345 1,25
DelB 11111 2,31
And not like
DelA 12345 1,25
DelB 12345 1,35
DelB 11111 2,31
DelC 11111 2,35
I know it is probably a stupid question with an easy solution, but I tried for about three hours now and just can't find a solution. Needles to say, I'm not very experienced with SQL.
Just Add an aggregate function to your deliverer field also, as appropriate (Either min or max). From your data, I guess you need min(deliverer) and hence use the below query to get your desired result.
Select mIN(Deliverer), ItemNumber, min(Price)
From MyTable
Group By ItemNumber;
EDIT:
Below query should help you get the deliverer with the lowest price item-wise:
SELECT TABA.ITEMNUMBER, TABA.MINPRICE, TABB.DELIVERER
FROM
(
SELECT ITEMNUMBER, MIN(PRICE) MINPRICE
FROM MYTABLE GROUP BY
ITEMNUMBER
) TABA JOIN
MYTABLE TABB
ON TABA.ITEMNUMBER=TABB.ITEMNUMBER AND
TABA.MINPRICE = TABB.PRICE
You should be able to do this with the RANK() (or DENSE_RANK()) functions, and a bit of partitioning, so something like:
; With rankings as (
SELECT Deliverer,
rankings.ItemNumber,
rankings.Price
RANK() OVER (PARTITION BY ItemNumber ORDER BY Price ASC) AS Ranking
FROM MyTable (Deliverer, ItemNumber, Price)
)
SELECT rankings.Deliverer,
rankings.ItemNumber,
rankings.Price
FROM rankings
WHERE ranking = 1

Grouping in T-SQL

Can you please advise how to make this special grouping in SQL
from this table
id FromDate ToDate UPC price IsGroupSpecial
3 2013-12-27 2013-12-30 6400000087492 315.00 1
2 2013-12-27 2013-12-31 6400000087492 405.00 0
Need to select all with min price but the id is not necessarily minimum - the id should be taken from that row where IsGroupSpecial= 0
I think this type of query is what you're looking for. If you're just looking for all records that have the minimum price than a group by clause is not necessary.
select *
from table
where price = (select min(price) from table))
If I understood correctly
select UPC, min(price), x.id
from table t1
cross apply (select id from table t2 where t2.IsGroupSpecial =0 and t1.UPC=t2.UPC) X
group by UPC, x.id

Find duplicate records in a table using SQL Server

I am validating a table which has a transaction level data of an eCommerce site and find the exact errors.
I want your help to find duplicate records in a 50 column table on SQL Server.
Suppose my data is:
OrderNo shoppername amountpayed city Item
1 Sam 10 A Iphone
1 Sam 10 A Iphone--->>Duplication to be detected
1 Sam 5 A Ipod
2 John 20 B Macbook
3 John 25 B Macbookair
4 Jack 5 A Ipod
Suppose I use the below query:
Select shoppername,count(*) as cnt
from dbo.sales
having count(*) > 1
group by shoppername
will return me
Sam 2
John 2
But I don't want to find duplicate just over 1 or 2 columns. I want to find the duplicate over all the columns together in my data. I want the result as:
1 Sam 10 A Iphone
with x as (select *,rn = row_number()
over(PARTITION BY OrderNo,item order by OrderNo)
from #temp1)
select * from x
where rn > 1
you can remove duplicates by replacing select statement by
delete x where rn > 1
SELECT OrderNo, shoppername, amountPayed, city, item, count(*) as cnt
FROM dbo.sales
GROUP BY OrderNo, shoppername, amountPayed, city, item
HAVING COUNT(*) > 1
SQL> SELECT JOB,COUNT(JOB) FROM EMP GROUP BY JOB;
JOB COUNT(JOB)
--------- ----------
ANALYST 2
CLERK 4
MANAGER 3
PRESIDENT 1
SALESMAN 4
Just add all fields to the query and remember to add them to Group By as well.
Select shoppername, a, b, amountpayed, item, count(*) as cnt
from dbo.sales
group by shoppername, a, b, amountpayed, item
having count(*) > 1
To get the list of multiple records use following command
select field1,field2,field3, count(*)
from table_name
group by field1,field2,field3
having count(*) > 1
Try this instead
SELECT MAX(shoppername), COUNT(*) AS cnt
FROM dbo.sales
GROUP BY CHECKSUM(*)
HAVING COUNT(*) > 1
Read about the CHECKSUM function first, as there can be duplicates.
Try this
with T1 AS
(
SELECT LASTNAME, COUNT(1) AS 'COUNT' FROM Employees GROUP BY LastName HAVING COUNT(1) > 1
)
SELECT E.*,T1.[COUNT] FROM Employees E INNER JOIN T1 ON T1.LastName = E.LastName
with x as (
select shoppername,count(shoppername)
from sales
having count(shoppername)>1
group by shoppername)
select t.* from x,win_gp_pin1510 t
where x.shoppername=t.shoppername
order by t.shoppername
First of all, I doubt that the result it not accurate? Seem like there are Three 'Sam' from the original table. But it is not critical to the question.
Then here we come for the question itself. Based on your table, the best way to show duplicate value is to use count(*) and Group by clause. The query would look like this
SELECT OrderNo, shoppername, amountPayed, city, item, count(*) as RepeatTimes FROM dbo.sales GROUP BY OrderNo, shoppername, amountPayed, city, item HAVING COUNT(*) > 1
The reason is that all columns together from your table uniquely identified each record, which means the records will be considered as duplicate only when all values from each column are exactly the same, also you want to show all fields for duplicate records, so the group by will not miss any column, otherwise yes because you can only select columns that participate in the 'group by' clause.
Now I would like to give you any example for With...Row_Number()Over(...), which is using table expression together with Row_Number function.
Suppose you have a nearly same table but with one extra column called Shipping Date, and the value may change even the rest are the same. Here it is:
OrderNo shoppername amountpayed city Item Shipping Date
1 Sam 10 A Iphone 2016-01-01
1 Sam 10 A Iphone 2016-02-02
1 Sam 5 A Ipod 2016-03-03
2 John 20 B Macbook 2016-04-04
3 John 25 B Macbookair 2016-05-05
4 Jack 5 A Ipod 2016-06-06
Notice that row# 2 is not a duplicate one if you still take all columns as a unit. But what if you want to treat them as duplicate as well in this case? You should use With...Row_Number()Over(...), and the query would look like this:
WITH TABLEEXPRESSION
AS
(SELECT *,ROW_NUMBER() OVER (PARTITION BY OrderNo, shoppername, amountPayed, city, item ORDER BY [Shipping Date] as Identifier) --if you consider the one with late shipping date as the duplicate
FROM dbo.sales)
SELECT * FROM TABLEEXPRESSION
WHERE Identifier !=1 --or use '>1'
The above query will give result together with Shipping Date, for example:
OrderNo shoppername amountpayed city Item Shipping Date Identifier
1 Sam 10 A Iphone 2016-02-02 2
Note this one is different from the one with 2016-01-01, and the reason why 2016-02-02 has been filtered out is PARTITION BY OrderNo, shoppername, amountPayed, city, item ORDER BY [Shipping Date] as Identifier, and Shipping Date is NOT one of the column that need to be took care of for duplicate records, which means the one with 2016-02-02 still could be a perfect result for your question.
Now summarize it little bit, using count(*) and Group by clause together is the best choice when you only want to show all columns from Group byclause as the result, otherwise you will miss the columns that do not participate in group by.
While For With...Row_Number()Over(...), it is suitable in every scenario that you want to find duplicate records, however, it is little bit complicated to write the query and little bit over engineered compared to the former one.
If your purpose is to delete duplicate records from table, you have to use the later WITH...ROW_NUMBER()OVER(...)...DELETE FROM...WHERE one.
Hope this helps!
You can use below methods to find the output
with Ctec AS
(
select *,Row_number() over(partition by name order by Name)Rnk
from Table_A
)
select Name from ctec
where rnk>1
select name from Table_A
group by name
having count(*)>1
Select *
from dbo.sales
group by shoppername
having(count(Item) > 1)
Select EventID,count() as cnt
from dbo.EventInstances
group by EventID
having count() > 1
The following is running code:
SELECT abnno, COUNT(abnno)
FROM tbl_Name
GROUP BY abnno
HAVING ( COUNT(abnno) > 1 )