Removing duplicate rows by selecting only those with minimum length value

Removing duplicate rows by selecting only those with minimum length value - sql

I have a table with two string columns: Name and Code. Code is unique, but Name is not. Sample data:
Name Code
-------- ----
Jacket 15
Jeans 003
Jeans 26
I want to select unique rows with the smallest Code value, but not in terms of numeric value; rather, the length of the string. Of course this does not work:
SELECT Name, Min(Code) as Code
FROM Clothes
GROUP BY Name, Code
The above code will return one row for Jeans like such:
Jeans | 003
That is correct, because as a number, 003 is less than 26. But not in my application, which cares about the length of the value, not the actual value. A value with a length of three characters is greater than a value with two characters. I actually need it to return this:
Jeans | 26
Because the length of 26 is shorter than the length of 003.
So how do I write SQL code that will select row that has the code with the minimum length, not the actual minimum value? I tried doing this:
SELECT Name, Min(Len(Code)) as Code
FROM Clothes
GROUP BY Name, Code
The above returns me only a single character so I end up with this:
Jeans | 2

;WITH cte AS
(
SELECT Name, Code, rn = ROW_NUMBER()
OVER (PARTITION BY Name ORDER BY LEN(Code))
FROM dbo.Clothes
)
SELECT Name, Code
FROM cte
WHERE rn = 1;
SQLfiddle demo
If you have multiple values of code that share the same length, the choice will be arbitrary, so you can break the tie by adding an additional order by clause, e.g.
OVER (PARTITION BY Name ORDER BY LEN(Code), CONVERT(INT, Code) DESC)
SQLfiddle demo

Try this
select clothes.name, MIN(code)
from clothes
inner join
(
SELECT
Name, Min(Len(Code)) as CodeLen
FROM
clothes
GROUP BY
Name
) results
on clothes.name = results.name
and LEN(clothes.code) = results.CodeLen
group by clothes.name

It sounds like you are trying to sort on the numeric value of the Code field. If so, the correct approach would be to cast it to INT first, and use that for sorting/min functions (in a subquery), then select the original code in your main query clause.

Related

sql group by only one column

I have read many answers, but don't find my answer. How can i group my table in sql by only one column "Code"? my table:
Name Code quantity storename
name1 12345 1 A1
name1 12345 3 A2
name2 9009 40 A1
name2 9009 5 A3
name3 4004 3 A1
I want to see
Name Code quantity storename
name1 12345 4 A1
name2 9009 45 A1
name3 4004 3 A1

try this (change the order if it is important):
SELECT NAME
, SUM(QUANTITY) AS QUANTITY
, code
, storename
FROM example
GROUP BY NAME
, code

What you are showing would be
Select Name,
Code,
Sum(Quantity) as Quantity
Min(Storename) as Storename
From MyTable
Group by Name, Code
What is your logic on storename? I assumed the lowest (first value)

try this query
select Name, Code, count(quantity) asquantity, storename from tablename
group by Code,Name,storename

This will produce the desired output for the shown input.
SELECT min(name) name,
code,
sum(quantity) quantity,
min(storename) storename
FROM elbat
GROUP BY code;
But it's another question if this is useful. Name probably is dependent of code? (Like code is the ID and name is the name of a product?) If it is, we won't have a problem there. But summing the quantity from all stores but only associating it with one store, is probably misleading. The store in the result doesn't necessarily have the quantity in the result in stock.

You need to decide how you want to rollup the non-grouped fields within the group and then place aggregations on them.
For example, you have two stores in your dataset that are tied to the same code. If you grouped by code then you would have to pick how the stores are presented when rolled up, generally a max or min would be used. However, you may need to include Store in your grouping as it seems like a pretty important entity. I'll show you two examples one with store in the grouping and one with it being aggregated.
With quantity rolled up at the Code level.
SELECT
Name,
Code,
CodeQuantity = SUM(Qauntity),
MaxStore = MAX(StoreName)
FROM
MyTable
GROUP BY
Name,
Code
With quantity rolled up at the store level.
SELECT
Name,
Code,
StoreName,
StoreQuantity = SUM(Qauntity)
FROM
MyTable
GROUP BY
Name,
Code,
StoreName

SELECT if any row within ID contains specific value

I have a DB with IDs and their classification (and much more columns that are useless for now),
Due to differences in unused columns, one ID may have different classifications like:
ID Classification
1001 A
1001 A
1002 A
1002 A
1002 B
What I need is to group IDs and put classification by the rule "If any of the lines within this ID is 'B'-classified, then the group (single row with this ID) is 'B'-classified, else - 'A'-classified.
So that ID 1001 = A, but ID 1002 = B.
I am aware of WHERE tab.field = ANY() function, but I have reciprocal situation - left part of comparison should be ANY, while right part should be hardcoded. I kind of assume, that as comparison result is boolean, than place of left\right parts is irrelevant, but I cannot figure out query-subquery relations.
Please help

You can use the count window function to do this.
select distinct id
,case when count(case when classification='B' then 1 end) over(partition by id) >=1 then 'B' else 'A' end as classified
from t

For the special case where the classification of interest is also the letter with the last alphabetical position, you can simply do:
SELECT
ID,
MAX(Classification)
FROM tab
GROUP BY ID;

You can simply find max of classification for each id:
select id,
max(classification)
from your_table
group by id;

SELECT DISTINCT is not working

Let's say I have a table name TableA with the below partial data:
LOOKUP_VALUE LOOKUPS_CODE LOOKUPS_ID
------------ ------------ ----------
5% 120 1001
5% 121 1002
5% 123 1003
2% 130 2001
2% 131 2002
I wanted to select only 1 row of 5% and 1 row of 2% as a view using DISTINCT but it fail, my query is:
SELECT DISTINCT lookup_value, lookups_code
FROM TableA;
The above query give me the result as shown below.
LOOKUP_VALUE LOOKUPS_CODE
------------ ------------
5% 120
5% 121
5% 123
2% 130
2% 131
But that is not my expected result, mt expected result is shown below:
LOOKUP_VALUE LOOKUPS_CODE
------------ ------------
5% 120
2% 130
May I know how can I achieve this without specifying any WHERE clause?
Thank you!

I think you're misunderstanding the scope of DISTINCT: it will give your distinct rows, not just distinct on the first field.
If you want one row for each distinct LOOKUP_VALUE, you either need a WHERE clause that will work out which one of them to show, or an aggregation strategy with a GROUP BY clause plus logic in the SELECT that tells the query how to aggregate the other columns (e.g. AVG, MAX, MIN)

Here's my guess at your problem - when you say
"The above query give me the result as shown in the data table above."
this is simply not true - please try it and update your question accordingly.
I am speculating here: I think you are trying to use "Distinct" but also output the other fields. If you run:
select distinct Field1, Field2, Field3 ...
Then your output will be "one row per distinct combination" of the 3 fields.
Try GROUP BY instead - this will let you select the Max, Min, Sum of other fields while still yielding "one row per unique combined values" for fields included in GROUP BY
example below uses your table to return one row per LOOKUP_VALUE and then the max and min of the remaining fields and the count of total records using your data:
select
LOOKUP_VALUE, min( LOOKUPS_CODE) LOOKUPS_CODE_min, max( LOOKUPS_CODE) LOOKUPS_CODE_max, min( LOOKUPS_ID) LOOKUPS_ID_min, max( LOOKUPS_ID) LOOKUPS_ID_max, Count(*) Record_Count
From TableA
Group by LOOKUP_VALUE

I wanted to select only 1 row of 5% and 1 row of 2%
This will get the lowest value lookups_code for each lookup_value:
SELECT lookup_value,
lookups_code
FROM (
SELECT lookup_value,
lookups_code,
ROW_NUMBER() OVER ( PARTITION BY lookup_value ORDER BY lookups_code ) AS rn
FROM TableA
)
WHERE rn = 1
You could also use GROUP BY:
SELECT lookup_value,
MIN( lookups_code ) AS lookups_code
FROM TableA
GROUP BY lookup_value

How about the MIN() function
I believe this works for your desired output, but am currently not able to test it.
SELECT Lookup_Value, MIN(LOOKUPS_CODE)
FROM TableA
GROUP BY Lookup_Value;

I'm going to take a total shot in the dark on this one, but because of the way you have named your fields it implies you are attempting to mimic the vlookup function within Microsoft Excel. If this is the case, the behavior when there are multiple matches is to pick the first match. As arbitrary as that sounds, it's the way it works.
If this is what you want, AND the first value is not necessarily the lowest (or highest, or best looking, or whatever), then the row_number aggregate function would probably suit your needs.
I give you a caveat that my ordering criteria is based on the database row number, which could conceivably be different than what you think. If, however, you insert them into a clean table (with a reset high water mark), then I think it's a pretty safe bet it will behave the way you want. If not, then you are better off including a field explicitly to tell it what order you want the choice to occur.
with cte as (
select
vlookup_value,
vlookups_code,
row_number() over (partition by vlookup_value order by rownum) as rn
from
TableA
)
select
vlookup_value, vlookups_code
from cte
where rn = 1

sql statement to select previous rows to a search param

Im after an sql statement (if it exists) or how to set up a method using several sql statements to achieve the following.
I have a listbox and a search text box.
in the search box, user would enter a surname e.g. smith.
i then want to query the database for the search with something like this :
select * FROM customer where surname LIKE searchparam
This would give me all the results for customers with surname containing : SMITH . Simple, right?
What i need to do is limit the results returned. This statement could give me 1000's of rows if the search param was just S.
What i want is the result, limited to the first 20 matches AND the 10 rows prior to the 1st match.
For example, SMI search:
Sives
Skimmings
Skinner
Skipper
Slater
Sloan
Slow
Small
Smallwood
Smetain
Smith ----------- This is the first match of my query. But i want the previous 10 and following 20.
Smith
Smith
Smith
Smith
Smoday
Smyth
Snedden
Snell
Snow
Sohn
Solis
Solomon
Solway
Sommer
Sommers
Soper
Sorace
Spears
Spedding
Is there anyway to do this?
As few sql statements as possible.
Reason? I am creating an app for users with slow internet connections.
I am using POSTGRESQL v9
Thanks
Andrew

WITH ranked AS (
SELECT *, ROW_NUMBER() over (ORDER BY surname) AS rowNumber FROM customer
)
SELECT ranked.*
FROM ranked, (SELECT MIN(rowNumber) target FROM ranked WHERE surname LIKE searchparam) found
WHERE ranked.rowNumber BETWEEN found.target - 10 AND found.target + 20
ORDER BY ranked.rowNumber
SQL Fiddle here. Note that the fiddle uses the example data, and I modified the range to 3 entries before and 6 entries past.

I'm assuming that you're looking for a general algorithm ...
It sounds like you're looking for a combination of finding the matches "greater than or equal to smith", and "less than smith".
For the former you'd order by surname and limit the result to 20, and for the latter you'd order by surname descending and limit to 10.
The two result sets can then be added together as arrays and reordered.

I think you need to use ROW_NUMBER() (see this link).
WITH cust1 AS (
SELECT *, ROW_NUMBER() OVER (ORDER BY surname) as numRow FROM customer
)
SELECT c1.surname, c1.numRow, x.flag
FROM cust1 c1, (SELECT *,
case when numRow = (SELECT MIN(numRow) FROM cust1 WHERE surname='Smith') then 1 else 0 end as flag
FROM cust1) x
WHERE x.flag = 1 and c1.numRow BETWEEN x.numRow - 1 AND x.numRow + 1
ORDER BY c1.numRow
SQLFiddle here.
This works, but the flag finally isn't necessary and it would be a query like PinnyM posts.

A variation on #PinnyM's solution:
WITH ranked AS (
SELECT
*,
ROW_NUMBER() over (ORDER BY surname) AS rowNumber
FROM customer
),
minrank AS (
SELECT
*,
MIN(CASE WHEN surname LIKE searchparam THEN rowNumber END) OVER () AS target
FROM ranked
)
SELECT
surname
FROM minrank
WHERE rowNumber BETWEEN target - 10 AND target + 20
;
Instead of two separate calls to the ranked CTE, one to get the first match's row number and the other to read the results from, another CTE is introduced to serve both purposes. Can't speak for PostgreSQL but in SQL Server this might result in a better execution plan for the query, although in either case the real efficiency would still need to be verified by proper testing.

how to select one tuple in rows based on variable field value

I'm quite new into SQL and I'd like to make a SELECT statement to retrieve only the first row of a set base on a column value. I'll try to make it clearer with a table example.
Here is my table data :
chip_id | sample_id
-------------------
1 | 45
1 | 55
1 | 5986
2 | 453
2 | 12
3 | 4567
3 | 9
I'd like to have a SELECT statement that fetch the first line with chip_id=1,2,3
Like this :
chip_id | sample_id
-------------------
1 | 45 or 55 or whatever
2 | 12 or 453 ...
3 | 9 or ...
How can I do this?
Thanks

i'd probably:
set a variable =0
order your table by chip_id
read the table in row by row
if table[row]>variable, store the table[row] in a result array,increment variable
loop till done
return your result array
though depending on your DB,query and versions you'll probably get unpredictable/unreliable returns.

You can get one value using row_number():
select chip_id, sample_id
from (select chip_id, sample_id,
row_number() over (partition by chip_id order by rand()) as seqnum
) t
where seqnum = 1
This returns a random value. In SQL, tables are inherently unordered, so there is no concept of "first". You need an auto incrementing id or creation date or some way of defining "first" to get the "first".
If you have such a column, then replace rand() with the column.

Provided I understood your output, if you are using PostGreSQL 9, you can use this:
SELECT chip_id ,
string_agg(sample_id, ' or ')
FROM your_table
GROUP BY chip_id

You need to group your data with a GROUP BY query.
When you group, generally you want the max, the min, or some other values to represent your group. You can do sums, count, all kind of group operations.
For your example, you don't seem to want a specific group operation, so the query could be as simple as this one :
SELECT chip_id, MAX(sample_id)
FROM table
GROUP BY chip_id
This way you are retrieving the maximum sample_id for each of the chip_id.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Removing duplicate rows by selecting only those with minimum length value - sql

Try this select clothes.name, MIN(code) from clothes inner join ( SELECT Name, Min(Len(Code)) as CodeLen FROM clothes GROUP BY Name ) results on clothes.name = results.name and LEN(clothes.code) = results.CodeLen group by clothes.name

It sounds like you are trying to sort on the numeric value of the Code field. If so, the correct approach would be to cast it to INT first, and use that for sorting/min functions (in a subquery), then select the original code in your main query clause.

Related

sql group by only one column

SELECT if any row within ID contains specific value

SELECT DISTINCT is not working

sql statement to select previous rows to a search param

how to select one tuple in rows based on variable field value

Categories

Resources