SQL server 2016 Getting distinct results when only limited to a where statement - sql

I am looking for a distinct list of the CUSTOMER_NAME field from my table. Normally I would simply do
SELECT
distinct
[CUSTOMER_NAME]
FROM [iData3].[dbo].[N241650]
or
SELECT
[CUSTOMER_NAME]
FROM [iData3].[dbo].[N241650]
Group by [CUSTOMER_NAME]
But I am limited in my query. Due to software restrictions, the query can only be of the form
SELECT * from
[iData3].[dbo].[N241650]
where ...
How do I get a distinct list of customer names given these restrictions? I essentially need to cram everything into the WHERE clause. I'm thinking possibly WHERE EXISTS or NOT EXISTS but I haven't used those conditions before so I'm not certain if they'd be useful.
This is not possible because... is acceptable if the disappointing answer.

You can use row_number() function :
SELECT TOP (1) WITH TIES [CUSTOMER_NAME]
FROM [iData3].[dbo].[N241650]
ORDER BY ROW_NUMBER() OVER (PARTITION BY CUSTOMER_NAME ORDER BY ?)
? indicates something identity or primary/unique column which you have.

You can group by that column to achieve the same result.
select CUSTOMER_NAME
from ...
group by CUSTOMER_NAME
order by CUSTOMER_NAME;
Another alternative is to use a stored procedure.

If you can't escape from the *, then you can't GROUP BY and if you just have a WHERE then you will need a key (unique set of columns) to be able to filter correctly or else you can't differentiate dupicates (and end up selecting more than 1 row with the same customer name).
It's a bit convoluted, but try with this. It will get you 1 row per each CUSTOMER_NAME.
SELECT
*
from
[iData3].[dbo].[N241650]
where
[N241650].KeyColumn IN
(
SELECT
Z.KeyColumn
FROM
(
SELECT
X.KeyColumn,
Ranking = ROW_NUMBER() OVER (PARTITION BY X.CUSTOMER_NAME ORDER BY X.KeyColumn ASC)
FROM
[iData3].[dbo].[N241650] AS X
WHERE
X.KeyColumn IS NOT NULL
) AS Z
WHERE
Z.Ranking = 1
)
The ORDER BY inside the OVER will determine which row you get for each CUSTOMER_NAME.
If you have multiple columns for your key, then you will have to switch the IN for an EXISTS against multiple columns (you can't do a multiple column IN in SQL Server).
SELECT
*
from
[iData3].[dbo].[N241650]
where
EXISTS (
SELECT
'key columns match'
FROM (
SELECT
X.KeyColumn1,
X.KeyColumn2,
Ranking = ROW_NUMBER() OVER (PARTITION BY X.CUSTOMER_NAME ORDER BY X.KeyColumn1 ASC)
FROM
[iData3].[dbo].[N241650] AS X
) AS Z
WHERE
Z.Ranking = 1 AND
[N241650].KeyColumn1 = Z.KeyColumn1 AND
[N241650].KeyColumn2 = Z.KeyColumn2
)

You need something unique in each row. If you have that, you can use:
SELECT CUSTOMER_NAME
FROM [iData3].[dbo].[N241650]
WHERE pk = (SELECT MIN(n2.pk)
FROM [iData3].[dbo].[N241650] n2
WHERE n2.CUSTOMER_NAME = N241650.N241650
);
pk is the unique column.

Related

How to replace IN CLAUSE USING EXISTS?

select
TV.ATTRIBUTE
FROM
TABLE_VALUE TV
WHERE
TV.NUMBERS IN (SELECT MAX(TV1.NUMBERS) FROM TABLE_VALUE TV1
WHERE TV.UNIQUE_ID=TV1.UNIQUE_ID GROUP BY UNIQUE_ID )
I'm not sure exists would help here, because - as you put it - for each unique_id there be many numbers values, and you want to select attribute for highest numbers for that particular unique_id.
exists is useful when you want to check whether something ... well, exists, but that's not the case here.
You do not want EXISTS, instead you can use the RANK or DENSE_RANK analytic functions:
SELECT attribute
FROM (
SELECT attribute,
DENSE_RANK() OVER (PARTITION BY unique_id ORDER BY numbers DESC) AS rnk
FROM table_value
)
WHERE rnk = 1
or use the MAX analytic function:
SELECT attribute
FROM (
SELECT attribute,
numbers,
MAX(numbers) OVER (PARTITION BY unique_id) AS max_numbers
FROM table_value
)
WHERE numbers = max_numbers;
Either option will only read from the table once.
If you really did want to use EXISTS (or IN) then it will be less efficient as you will query the same table twice but you can do it with a HAVING clause:
SELECT tv.attribute
FROM table_value tv
WHERE EXISTS(
SELECT 1
FROM table_value tv1
WHERE tv1.unique_id = tv.unique_id
HAVING MAX(tv1.numbers) = tv.numbers
)
fiddle

Listing multiple columns in a single row in SQL

(select ID,EXTERNAL_TRANSACTION_ID,EXTERNAL_TRANSACTION_TYPE,ROW_NUMBER() OVER(PARTITION BY EXTERNAL_TRANSACTION_ID ORDER BY ID ) AS SEQNUM
from AC_POS_TRANSACTION_TRK aptt WHERE [RESULT] ='Success'
GROUP BY ID, EXTERNAL_TRANSACTION_ID,EXTERNAL_TRANSACTION_TYPE )
Hello,
On above query, I want to get rows of transaction id's which has seqnum=1 and seqnum=2
But if that transaction id has no second row (seqnum=2), I dont want to get any row for that transaction id.
Thanks!!
Something like this
Not 100% sure if this is correct without you table definition, but my understanding is that you want to EXCLUDE records if that record has an entry with seqnum=2 -- you can't use a where clause alone because that would still return seqnum = 1.
You can use an exists /not exists or in/not in clause like this
(select ID,EXTERNAL_TRANSACTION_ID,EXTERNAL_TRANSACTION_TYPE,ROW_NUMBER() OVER(PARTITION BY EXTERNAL_TRANSACTION_ID ORDER BY ID ) AS SEQNUM
from AC_POS_TRANSACTION_TRK aptt WHERE [RESULT] ='Success'
and not exists ( select 1 from AC_POS_TRANSACTION_TRK a where a.id = aptt.id
and a.seqnum = 2)
GROUP BY ID, EXTERNAL_TRANSACTION_ID,EXTERNAL_TRANSACTION_TYPE )
basically what this does is it excludes records if a record exists as specified in the NOT EXISTS query.
One option you can try is to add a count of rows per group using the same partioning critera and then filter accordingly. Not entirely sure about your query without seeing it in context and with sample data - there's no aggregation so why use group by?
However can you try something along these lines
select * from (
select ID,EXTERNAL_TRANSACTION_ID,EXTERNAL_TRANSACTION_TYPE,
Row_Number() over(partition by EXTERNAL_TRANSACTION_ID order by ID) as SEQNUM,
Count(*) over(partition by EXTERNAL_TRANSACTION_ID) Qty
from AC_POS_TRANSACTION_TRK
where [RESULT] ='Success'
)x
where SEQNUM in (1,2) and Qty>1
This should do the job.
With Qry As (
-- Your original query goes here
),
Select Qry.*
From Qry
Where Exists (
Select *
From Qry Qry1
Where Qry1.EXTERNAL_TRANSACTION_ID = Qry.EXTERNAL_TRANSACTION_ID
And Qry1.SEQNUM = 1
)
And Exists (
Select *
From Qry Qry2
Where Qry2.EXTERNAL_TRANSACTION_ID = Qry.EXTERNAL_TRANSACTION_ID
And Qry2.SEQNUM = 2
)
BTW, your original query looks problematic to me, specifically I think that instead of a GROUP BY columns those columns should be in the PARTITION BY clause of the OVER statement, but without knowing more about the table structures and what you're trying to achieve, I could not say for sure.

How to delete the duplicate data in table (Postgres)

I want to delete the duplicated data in a table , I know there is a way use
SELECT
fruit,
COUNT( fruit )
FROM
basket
GROUP BY
fruit
HAVING
COUNT( fruit )> 1
ORDER BY
fruit;
to find them , buy I need to determine every column's value is equal , which means tableA.* = tableA.* (except id , id is the auto-increment primary key )
and I tried this:
SELECT
*,
COUNT( * )
FROM
myTable
GROUP BY
*
HAVING
COUNT( * )> 1
ORDER BY
id;
but it says I can't use GROUP BY * , so how can I find & delete the duplicated data(need every column's value is equal except id)?
using
SELECT * DISTINCT
DISTINCT remove duplicated result
You need to try something similar to be below query. You apply PARTITION BY for the columns other than Id (as it is incrementing unique value). PARTITION BY should be applied for columns, for which you want to check duplicates.
Also refer to Row_Number in Postgres & Common Table expression in Postgres
WITH DuplicateTableRows AS
(
SELECT Id, Row_Number() OVER (PARTITION BY col1, col2... ORDER BY Id)
FROM
Table1
)
DELETE FROM Table1
WHERE Id IN (SELECT Id FROM Table1 WHERE row_number > 1)
You can do this using JSON:
select (to_jsonb(b) - 'id')
from basket b
group by 1
having count(*) > 1;
The result is as JSON. Unfortunately, to extract the values back into a record, you need to list the columns individually.

Select all but last row in Oracle SQL

I want to pull all rows except the last one in Oracle SQL
My database is like this
Prikey - Auto_increment
common - varchar
miles - int
So I want to sum all rows except the last row ordered by primary key grouped by common. That means for each distinct common, the miles will be summed (except for the last one)
Note: the question was changed after this answer was posted. The first two queries work for the original question. The last query (in the addendum) works for the updated question.
This should do the trick, though it will be a bit slow for larger tables:
SELECT prikey, authnum FROM myTable
WHERE prikey <> (SELECT MAX(prikey) FROM myTable)
ORDER BY prikey
This query is longer but for a large table it should faster. I'll leave it to you to decide:
SELECT * FROM (
SELECT
prikey,
authnum,
ROW_NUMBER() OVER (ORDER BY prikey DESC) AS RowRank
FROM myTable)
WHERE RowRank <> 1
ORDER BY prikey
Addendum There was an update to the question; here's the updated answer.
SELECT
common,
SUM(miles)
FROM (
SELECT
common,
miles,
ROW_NUMBER() OVER (PARTITION BY common ORDER BY prikey DESC) AS RowRank
FROM myTable
)
WHERE RowRank <> 1
GROUP BY common
Looks like I am a little too late but here is my contribution, similar to Ed Gibbs' first solution but instead of calculating the max id for each value in the table and then comparing I get it once using an inline view.
SELECT d1.prikey,
d1.authnum
FROM myTable d1,
(SELECT MAX(prikey) prikey myTable FROM myTable) d2
WHERE d1.prikey != d2.prikey
At least I think this is more efficient if you want to go without the use of Analytics.
query to retrieve all the records in the table except first row and last row
select * from table_name
where primary_id_column not in
(
select top 1 * from table_name order by primary_id_column asc
)
and
primary_id_column not in
(
select top 1 * from table_name order by primary_id_column desc
)

Select a Column in SQL not in Group By

I have been trying to find some info on how to select a non-aggregate column that is not contained in the Group By statement in SQL, but nothing I've found so far seems to answer my question. I have a table with three columns that I want from it. One is a create date, one is a ID that groups the records by a particular Claim ID, and the final is the PK. I want to find the record that has the max creation date in each group of claim IDs. I am selecting the MAX(creation date), and Claim ID (cpe.fmgcms_cpeclaimid), and grouping by the Claim ID. But I need the PK from these records (cpe.fmgcms_claimid), and if I try to add it to my select clause, I get an error. And I can't add it to my group by clause because then it will throw off my intended grouping. Does anyone know any workarounds for this? Here is a sample of my code:
Select MAX(cpe.createdon) As MaxDate, cpe.fmgcms_cpeclaimid
from Filteredfmgcms_claimpaymentestimate cpe
where cpe.createdon < 'reportstartdate'
group by cpe.fmgcms_cpeclaimid
This is the result I'd like to get:
Select MAX(cpe.createdon) As MaxDate, cpe.fmgcms_cpeclaimid, cpe.fmgcms_claimid
from Filteredfmgcms_claimpaymentestimate cpe
where cpe.createdon < 'reportstartdate'
group by cpe.fmgcms_cpeclaimid
The columns in the result set of a select query with group by clause must be:
an expression used as one of the group by criteria , or ...
an aggregate function , or ...
a literal value
So, you can't do what you want to do in a single, simple query. The first thing to do is state your problem statement in a clear way, something like:
I want to find the individual claim row bearing the most recent
creation date within each group in my claims table
Given
create table dbo.some_claims_table
(
claim_id int not null ,
group_id int not null ,
date_created datetime not null ,
constraint some_table_PK primary key ( claim_id ) ,
constraint some_table_AK01 unique ( group_id , claim_id ) ,
constraint some_Table_AK02 unique ( group_id , date_created ) ,
)
The first thing to do is identify the most recent creation date for each group:
select group_id ,
date_created = max( date_created )
from dbo.claims_table
group by group_id
That gives you the selection criteria you need (1 row per group, with 2 columns: group_id and the highwater created date) to fullfill the 1st part of the requirement (selecting the individual row from each group. That needs to be a virtual table in your final select query:
select *
from dbo.claims_table t
join ( select group_id ,
date_created = max( date_created )
from dbo.claims_table
group by group_id
) x on x.group_id = t.group_id
and x.date_created = t.date_created
If the table is not unique by date_created within group_id (AK02), you you can get duplicate rows for a given group.
You can do this with PARTITION and RANK:
select * from
(
select MyPK, fmgcms_cpeclaimid, createdon,
Rank() over (Partition BY fmgcms_cpeclaimid order by createdon DESC) as Rank
from Filteredfmgcms_claimpaymentestimate
where createdon < 'reportstartdate'
) tmp
where Rank = 1
The direct answer is that you can't. You must select either an aggregate or something that you are grouping by.
So, you need an alternative approach.
1). Take you current query and join the base data back on it
SELECT
cpe.*
FROM
Filteredfmgcms_claimpaymentestimate cpe
INNER JOIN
(yourQuery) AS lookup
ON lookup.MaxData = cpe.createdOn
AND lookup.fmgcms_cpeclaimid = cpe.fmgcms_cpeclaimid
2). Use a CTE to do it all in one go...
WITH
sequenced_data AS
(
SELECT
*,
ROW_NUMBER() OVER (PARITION BY fmgcms_cpeclaimid ORDER BY CreatedOn DESC) AS sequence_id
FROM
Filteredfmgcms_claimpaymentestimate
WHERE
createdon < 'reportstartdate'
)
SELECT
*
FROM
sequenced_data
WHERE
sequence_id = 1
NOTE: Using ROW_NUMBER() will ensure just one record per fmgcms_cpeclaimid. Even if multiple records are tied with the exact same createdon value. If you can have ties, and want all records with the same createdon value, use RANK() instead.
You can join the table on itself to get the PK:
Select cpe1.PK, cpe2.MaxDate, cpe1.fmgcms_cpeclaimid
from Filteredfmgcms_claimpaymentestimate cpe1
INNER JOIN
(
select MAX(createdon) As MaxDate, fmgcms_cpeclaimid
from Filteredfmgcms_claimpaymentestimate
group by fmgcms_cpeclaimid
) cpe2
on cpe1.fmgcms_cpeclaimid = cpe2.fmgcms_cpeclaimid
and cpe1.createdon = cpe2.MaxDate
where cpe1.createdon < 'reportstartdate'
Thing I like to do is to wrap addition columns in aggregate function, like max().
It works very good when you don't expect duplicate values.
Select MAX(cpe.createdon) As MaxDate, cpe.fmgcms_cpeclaimid, MAX(cpe.fmgcms_claimid) As fmgcms_claimid
from Filteredfmgcms_claimpaymentestimate cpe
where cpe.createdon < 'reportstartdate'
group by cpe.fmgcms_cpeclaimid
What you are asking, Sir, is as the answer of RedFilter.
This answer as well helps in understanding why group by is somehow a simpler version or partition over:
SQL Server: Difference between PARTITION BY and GROUP BY
since it changes the way the returned value is calculated and therefore you could (somehow) return columns group by can not return.
You can use as below,
Select X.a, X.b, Y.c from (
Select X.a as a, sum (b) as sum_b from name_table X
group by X.a)X
left join from name_table Y on Y.a = X.a
Example;
CREATE TABLE #products (
product_name VARCHAR(MAX),
code varchar(3),
list_price [numeric](8, 2) NOT NULL
);
INSERT INTO #products VALUES ('paku', 'ACE', 2000)
INSERT INTO #products VALUES ('paku', 'ACE', 2000)
INSERT INTO #products VALUES ('Dinding', 'ADE', 2000)
INSERT INTO #products VALUES ('Kaca', 'AKB', 2000)
INSERT INTO #products VALUES ('paku', 'ACE', 2000)
--SELECT * FROM #products
SELECT distinct x.code, x.SUM_PRICE, product_name FROM (SELECT code, SUM(list_price) as SUM_PRICE From #products
group by code)x
left join #products y on y.code=x.code
DROP TABLE #products