sql distinct with multiple fields requried - sql

I'm using Advantage Database Server by Sybase. I need to elimiate duplicate addbatch's from my report, but having trouble pulling up just distinct records. Any idea what I am missing?
here is what I am using
SELECT DISTINCT
SI.[addbatch] as [Batch#],
SI.[current account #] as [Account],
SI.[status date] as [Status Date],
SI.[SKU] as [SKU],
AC.[email address] as [Email]
FROM salesinventory SI, accounts AC
WHERE AC.[account #]=SI.[current account #] and [Status Date] > '6/1/2015'
I still get duplicate addbatch's though. I'm not sure where I am going wrong! Thanks in advance! Wasn't even sure how to google this question!

The problem is that you need to check uniqueness of a single column and that's not actually what your code is performing. Try this
SELECT *
FROM (SELECT SI.[addbatch] as [Batch#],
SI.[current account #] as [Account],
SI.[status date] as [Status Date],
ETC,
ROW_NUMBER() OVER (PARTITION BY [Batch#]) AS RowNumber
FROM salesinventory SI, accounts AC
WHERE AC.[account #]=SI.[current account #] and [Status Date] > '6/1/2015') as rec
WHERE rec.RowNumber = 1

-- The following code is generic to de-duplicate records, modify to suit your need.
select x.[Well_Name] as nameX
, x.[TestDate] as dateX
from (
SELECT count(*) as dup
,[Well_Name]
,[TestDate]enter code here
FROM [dbo].[WellTests]
group by [TestDate] ,[Well_Name] ) x
where dup > 1

If you want to have unique batch numbers in your result, you have to GROUP BY the batch field only.
Something like this should work:
SELECT
SI.[addbatch] as [Batch#],
MIN(SI.[current account #]) as [Account],
MIN(SI.[status date]) as [Status Date],
MIN(SI.[SKU]) as [SKU],
MIN(AC.[email address]) as [Email]
FROM salesinventory SI, accounts AC
WHERE AC.[account #]=SI.[current account #] and [Status Date] > '6/1/2015'
GROUP BY
SI.[addbatch]
You didn't say how you want to aggregate the other columns, just replace MIN with something that makes more sense for you, like SUM or COUNT, etc.
There is a topic about grouping in the documentation.
PS: SELECT DISTINCT is (basically) just a shorter way to GROUP BY on all columns without any aggregation.

Related

New to SQL. Would like to convert an IF(COUNTIFS()) Excel formula to SQL code and have SQL calculate it instead of Excel

I am running SQL Server 2008 R2 (RTM).
I have a SQL query that pulls Dates, Products, Customers and Units:
select
[Transaction Date] as Date,
[SKU] as Product,
[Customer Name] as Customer,
sum(Qty) as Units
from dataset
where [Transaction Date] < '2019-03-01' and [Transaction Date] >= '2016-01-01'
group by [Transaction Date], [SKU], [Customer Name]
order by [Transaction Date]
This pulls hundreds of thousands of records and I wanted to determine if a certain transaction was a new order or reorder based on the following logic:
Reorder: That specific Customer has ordered that specific product in the last 6 months
New Order: That specific Customer hasn’t ordered that specific product in the last 6 months
For that I have this formula in Excel that seems to be working:
=IF(COUNTIFS(A$1:A1,">="&DATE(YEAR(A2),MONTH(A2)-6,DAY(A2)),C$1:C1,C2,B$1:B1,B2),"Reorder","New Order")
The formula works when I paste it individually or in a smaller dataset, but when I try to copy paste it to all 500K+ rows, Excel gives up because it loops for each calculation.
This could probably be done in SQL, but I don’t have the knowledge on how to convert this excel formula to SQL, I just started studying it.
You're doing pretty well with the start of your query there. There are three additional functions you're looking to add to your query.
The first thing you'll need is the easiest. GETDATE() simply returns the current date. You'll need that when you're comparing the current date to the transaction date.
The second function is DATEDIFF, which will give you a unit of time between two dates (months, days, years, quarters, etc). Using DATEDIFF, you can say "is this date within the last 6 months". The format for this is pretty easy. It's DATEDIFF(interval, date1, date2).
The thrid function you're looking for is CASE, which allows you to tell SQL to give you one answer if one condition is met, but a different answer if a different condition is met. For your example, you can say "if the difference in days is < 60, return 'Reorder', if not give me 'New Order'".
Putting it all together:
SELECT CASE
WHEN DATEDIFF(MONTH, [Transaction Date], GETDATE()) <= 6
THEN 'Reorder'
ELSE 'New Order'
END as ORDER_TYPE
,[Transaction Date] AS DATE
,[SKU] AS PRODUCT
,[Customer Name] AS CUSTOMER
,Qty AS UNITS
FROM DATASET
For additonal examples on CASE, take a look at this site: https://www.w3schools.com/sql/sql_ref_case.asp
For additional examples on DATEDIFF, take a look here: See the
following webpage for examples and a chance to try it out:
https://www.w3schools.com/sql/func_sqlserver_datediff.asp
SELECT CASE
WHEN Datediff(day, [transaction date], Getdate()) <= 180 THEN 'reorder'
ELSE 'Neworder'
END,
[transaction date] AS Date,
[sku] AS Product,
[customer name] AS Customer,
qty AS Units
FROM datase
If I understand correctly, you want to peak at the previous date and make a comparison. This suggests lag():
select (case when lag([Transaction Date]) over (partition by SKU, [Customer Name] order by [Transaction Date]) >
dateadd(month, -6, [Transaction Date])
then 'Reorder'
else 'New Order'
end) as Order_Type
[Transaction Date] as Date,
[SKU] as Product,
[Customer Name] as Customer,
sum(Qty) as Units
from dataset d
group by [Transaction Date], [SKU], [Customer Name];
EDIT:
In SQL Server 2008, you can emulate the LAG() using OUTER APPLY:
select (case when dprev.[Transaction Date] >
dateadd(month, -6, d.[Transaction Date])
then 'Reorder'
else 'New Order'
end) as Order_Type
d.[Transaction Date] as Date,
d.[SKU] as Product,
d.[Customer Name] as Customer,
sum(d.Qty) as Units
from dataset d outer apply
(select top (1) dprev.*
from dataset dprev
where dprev.SKU = d.SKU and
dprev.[Customer Name] = d.[Customer Name] and
dprev.[Transaction Date] < d.[Transaction Date]
order by dprev.[Transaction Date] desc
) dprev
group by d.[Transaction Date], d.[SKU], d.[Customer Name];

SQL Select Same Table Multiple Sums

I am trying to wrap my head round this issue and I am sure the answer exists here a million times but then I am not searching for the right question.
I have a huge sales table [SALES] and I am extracting
SELECT DISTINCT S1.[ORDER ID], S1.SUPPLIER, SUM(S1.[ORDER TOTAL]) AS SUPPLIERTOTAL
FROM [SALES] S1
LEFT JOIN
(
Select s2.[Order ID], S2.[Supplier], S2.[Supplier Colour], SUM(S2.[Order TOTAL]) AS COLOURTOTAL
FROM [SALES]
WHERE [SALES].[SALESDATE] Between '20160101' and '20170101'
) AS s2
ON s1.[Order ID] = s2.[Order ID]
I have thrown this code together as an illustration as I am not by my work PC at present. My issue is that when I do get the re-select to work it produces the correct order value from the first select.
E.G Lets say the manufacturer was Ford and the total value was 100000 over ten orders it returns the 100000 correctly however on the sub select it appears to take the total value and multiply it by the total number of rows in the table. I am trying to work out what is going on with the data and query but cannot see the issue.
The only factor if its of influence is that the table has no primary key but as I am providing referential integrity with the join didn't believe that would be the case...
Anyone able to answer or come across this issue>
A bit of guessing here, as the question is not too clear, but I think you are looking for something like this:
SELECT S1.[ORDER ID], S1.SUPPLIER, SUM(S1.[ORDER TOTAL]) AS SUPPLIERTOTAL, SUM(S2.COLOURTOTAL) as COLOURTOTAL
FROM [SALES] S1
LEFT JOIN
(
Select s2.[Order ID], S2.[Supplier], S2.[Supplier Colour], S2.[Order TOTAL] AS COLOURTOTAL
FROM [SALES]
WHERE [SALES].[SALESDATE] Between '20160101' and '20170101'
) AS s2
ON s1.[Order ID] = s2.[Order ID]
GROUP BY S1.[ORDER ID], S1.SUPPLIER

Extract a specific line from a SELECT statement based on the last Trasanction date

Good day,
I have an SQL code that return to me all quantities that I received over time, but I want to display only the latest one
SELECT * FROM
(SELECT DISTINCT
[dbo].[ttcibd001110].[t_cmnf] AS [Manufacturer],
[dbo].[ttcibd001110].[t_item] AS [Item code],
[dbo].[ttcibd001110].[t_dsca] AS [Description],
[dbo].[ttcibd001110].[t_seak] AS [Search key 1],
[dbo].[twhinr110110].[t_trdt] AS [Transaction date],
[dbo].[twhinr110110].[t_cwar] AS [Warehouse],
[dbo].[twhinr110110].[t_qstk] AS [Quantity Inventory Unit]
FROM [dbo].[twhinr110110] LEFT JOIN [dbo].[ttcibd001110]
ON [dbo].[twhinr110110].[t_item]=[dbo].[ttcibd001110].[t_item]
WHERE [dbo].[twhinr110110].[t_koor]='2' AND [dbo].[ttcibd001110].[t_cmnf]='ManufacturerX') AS tabel
WHERE ltrim(tabel.[Item code])='1000045'
Now, from this selection I want to select only the line with the latest Transaction date, but I am stuck.
Can somebody help me in this way?
Thank you!
Change your beginning to
SELECT TOP 1
and after where use
ORDER BY [Transaction date] DESC

Column invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause

We have a table which will capture the swipe record of each employee. I am trying to write a query to fetch the list of distinct employee record by the first swipe for today.
We are saving the swipe date info in datetime column. Here is my query its throwing exception.
select distinct
[employee number], [Employee First Name]
,[Employee Last Name]
,min([DateTime])
,[Card Number]
,[Reader Name]
,[Status]
,[Location]
from
[Interface].[dbo].[VwEmpSwipeDetail]
group by
[employee number]
where
[datetime] = CURDATE();
Getting error:
Column 'Interface.dbo.VwEmpSwipeDetail.Employee First Name' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
Any help please?
Thanks in advance.
The error says it all:
...Employee First Name' is invalid in the select list because it is not contained
in either an aggregate function or the GROUP BY clause
Saying that, there are other columns that need attention too.
Either reduce the columns returned to only those needed or include the columns in your GROUP BY clause or add aggregate functions (MIN/MAX). Also, your WHERE clause should be placed before the GROUP BY.
Try:
select distinct [employee number]
,[Employee First Name]
,[Employee Last Name]
,min([DateTime])
,[Card Number]
,min([Reader Name])
from [Interface].[dbo].[VwEmpSwipeDetail]
where CAST([datetime] AS DATE)=CAST(GETDATE() AS DATE)
group by [employee number], [Employee First Name], [Employee Last Name], [Card Number]
I've removed status and location as this is likely to return non-distinct values. In order to return this data, you may need a subquery (or CTE) that first gets the unique IDs of the SwipeDetails table, and from this list you can join on to the other data, something like:
SELECT [employee number],[Employee First Name],[Employee Last Name].. -- other columns
FROM [YOUR_TABLE]
WHERE SwipeDetailID IN (SELECT MIN(SwipeDetailsId) as SwipeId
FROM SwipeDetailTable
WHERE CAST([datetime] AS DATE)=CAST(GETDATE() AS DATE)
GROUP BY [employee number])
Please Try Below Query :
select distinct [employee number],[Employee First Name]
,[Employee Last Name]
,min([DateTime])
,[Card Number]
,[Reader Name]
,[Status]
,[Location] from [Interface].[dbo].[VwEmpSwipeDetail] group by [employee number],[Employee First Name]
,[Employee Last Name]
,[Card Number]
,[Reader Name]
,[Status]
,[Location] having [datetime]=GetDate();
First find the first timestamp for each employee on the given day (CURDATE), then join back to the main table to get all the details:
WITH x AS (
SELECT [employee number], MIN([datetime] AS minDate
FROM [Interface].[dbo].[VwEmpSwipeDetail]
WHERE CAST([datetime] AS DATE) = CURDATE()
GROUP BY [employee number]
)
select [employee number]
,[Employee First Name]
,[Employee Last Name]
,[DateTime]
,[Card Number]
,[Reader Name]
,[Status]
,[Location]
from [Interface].[dbo].[VwEmpSwipeDetail] y
JOIN x ON (x.[employee number] = y.[employee number] AND x.[minDate] =Y.[datetime]
This should not be marked as mysql as this would not happen in mysql.
sql-server does not know which of the grouped [Employee First Name] values to return so you need to add an aggregate (even if you only actually expect one result). min/max will both work in that case. The same would apply to all the other rows where they are not in the GROUP BY or have an aggregate function (EG min) around them.

SQL aggregate and other fields showing in query

I have a query, where I need the MIN of a DateTime field and then I need the value of a corresponding field in the same row.
Now, I have something like this, however I cannot get Price field without putting it also in an aggregate clause, which is not what I want.
SELECT MIN([Registration Time]), Price FROM MyData WHERE [Product Series] = 'XXXXX'
I need the MIN of the Registration Time field and then I just want the corresponding Price field for that row, however how do I show that?
I do also need my WHERE clause as shown.
I'm sure I've overlooked something really obvious. Using SQL Server 2008
If you want just one record with [Registration Time], Price, it'd be as simple as this:
select top 1 [Registration Time], Price
from MyData
where [Product Series] = 'XXXXX'
order by [Registration Time]
If you want minimum [Registration Time] and corresponding Price for all [Product Series], then there's a few approaches, for example, using row_number() function:
with cte as (
select
[Registration Time], Price,
row_number() over(partition by [Product Series] order by [Registration Time]) as rn
from MyData
)
select
[Registration Time], Price, [Product Series]
where rn = 1