SQL: How to select distinct on some columns

SQL: How to select distinct on some columns - sql

I have a table looking something like this:
+---+------------+----------+
|ID | SomeNumber | SomeText |
+---+------------+----------+
|1 | 100 | 'hey' |
|2 | 100 | 'yo' |
|3 | 100 | 'yo' | <- Second occurrence
|4 | 200 | 'ey' |
|5 | 200 | 'hello' |
|6 | 200 | 'hello' | <- Second occurrence
|7 | 300 | 'hey' | <- Single
+---+------------+----------+
I would like to extract the rows where SomeNumber appears more than ones, and SomeNumbers and SomeText are distinct. That means I would like the following:
+---+------------+----------+
|ID | SomeNumber | SomeText |
+---+------------+----------+
|1 | 100 | 'hey' |
|2 | 100 | 'yo' |
|4 | 200 | 'ey' |
|5 | 200 | 'hello' |
+---+------------+----------+
I don't know what to do here.
I need something along the lines:
SELECT t.ID, DISTINCT(t.SomeNumber, t.SomeText) --this is not possible
FROM (
SELECT mt.ID, mt.SomeNumber, mt.SomeText
FROM MyTable mt
GROUP BY mt.SomeNumber, mt.SomeText --can't without mt.ID
HAVING COUNT(*) > 1
)
Any suggestions?

Using a cte with row number and count rows might get you what you need:
Create and populate sample table (Please save us this step in your future questions):
CREATE TABLE MyTable(id int, somenumber int, sometext varchar(10));
INSERT INTO MyTable VALUES
(1,100,'hey'),
(2,100,'yo'),
(3,100,'yo'),
(4,200,'ey'),
(5,200,'hello'),
(6,200,'hello'),
(7,300,'hey');
The query:
;WITH cte as
(
SELECT id,
someNumber,
someText,
ROW_NUMBER() OVER (PARTITION BY someNumber, someText ORDER BY ID) rn,
COUNT(id) OVER (PARTITION BY someNumber) rc
FROM MyTable
)
SELECT id, someNumber, someText
FROM cte
WHERE rn = 1
AND rc > 1
Results:
id someNumber someText
1 100 hey
2 100 yo
4 200 ey
5 200 hello

Related

Return top 10 values from each combination of codes from two columns in SQL

For my analysis, I need 10 records from each combination two columns that hold channel and category codes. For example:
|COUNT| Channel_Code | Category_Code |
|————-| ------—————— | ------——————- |
|9526 | ABC | DEF |
|4527 | ABC | JFK |
|10 | ABC | 123 |
|912 | WED | MLK |
|75 | KJJ | ONL |
|1000 | WED | DEF |
I only have tried filtering on
WHERE channel_code = ABC
AND Category_Code = DEF
Sample 10;
Also they using rownum as well, but no luck.
What I’m expecting the output to look like:
|RECORD NUM| Channel_Code | Category_Code |
|—————————-| ------—————— | ------——————- |
|1 | ABC | DEF |
|2 | ABC | DEF |
|3 | ABC | DEF |
|4 | ABC | DEF |
|5 | ABC | DEF |
|6 | ABC | DEF |
Etc… up until the 10th record. Then the next combination will start with 10 records of ABC and JFK
Is there a way to partition this in Teradata SQL? Or another possible solution. Thanks for your help!

You can use row_number as you mentioned:
SELECT
record_num, channel_code, category_code
FROM (SELECT record_num, channel_code, category_code,
ROW_NUMBER over (partition by channel_code, category_code order by record_num asc) as rn
FROM table_name
)
WHERE rn<=10

If you are basically trying to create these rows, you can use a cross join to a simple numbers table.
create volatile table vt_nums
(num integer)
on commit preserve rows;
insert into vt_nums values(1);
insert into vt_nums values(2);
insert into vt_nums values(3);
insert into vt_nums values(4);
insert into vt_nums values(5);
And here's some made up data to join with:
create volatile table vt_foo
(col1 varchar(10))
on commit preserve rows;
insert into vt_foo values ('a');
insert into vt_foo values ('b');
Finally:
select
vt_nums.num,
vt_foo.col1
from
vt_foo
cross join vt_nums
order by
2,1
Which will return:
num col1
1 a
2 a
3 a
4 a
5 a
1 b
2 b
3 b
4 b
5 b

SQL query to split and keep only the top N values

I have the following table data:
| name |items |
--------------------
| Bob |1, 2, 3 |
| Rick |5, 3, 8, 4|
| Bill |2, 4 |
I need to create a table with a split items column, but with the limitation to have at most N items per name. E.g. for N = 3 the table should look like this:
|name |item|
-----------
|Bob |1 |
|Bob |2 |
|Bob |3 |
|Rick |5 |
|Rick |3 |
|Rick |8 |
|Bill |2 |
|Bill |4 |
 I have the following query that splits items correctly, but doesn't account for the maximum number N. What should I modify in the query (standard SQL, BigQuery) to account for N?
WITH data_split AS (
SELECT name, SPLIT(items,',') AS item
FROM (
SELECT name, items
-- A lot of additional logic here
FROM data
)
)
SELECT name, item
FROM data_split
CROSS JOIN UNNEST(data_split.item) AS item

You can try a more semi-standard way - works practically everywhere:
WITH
-- your input ...
indata(id,nam,items) AS ( -- need a sorting column "id" to keep the sort order
SELECT 1, 'Bob' ,'1,2,3' -- blanks after comma can irritate
UNION ALL SELECT 2, 'Rick','5,3,8,4' -- the splitting function below ...
UNION ALL SELECT 3, 'Bill','2,4'
)
-- real query starts here, replace comma below with "WITH" ...
,
-- exactly 3 integers
i(i) AS (
SELECT 1 -- need to add FROM DUAL , in Oracle, for example ...
UNION ALL SELECT 2
UNION ALL SELECT 3
)
SELECT
id
, nam
, SPLIT(items,',',i) AS item -- SPLIT_PART in other DBMS-s
FROM indata CROSS JOIN i
WHERE SPLIT_PART(items,',',i) <> ''
ORDER BY 1, 3
;
-- out id | nam | item
-- out ----+------+------
-- out 1 | Bob | 1
-- out 1 | Bob | 2
-- out 1 | Bob | 3
-- out 2 | Rick | 3
-- out 2 | Rick | 5
-- out 2 | Rick | 8
-- out 3 | Bill | 2
-- out 3 | Bill | 4

Consider below approach (BigQuery)
select name, trim(item) item
from your_table, unnest(split(items)) item with offset
where offset < 3
if applied to sample data in your question - output is

How to make MS SQL select on this

I have MS SQL database table like this
TableA
+----+-----------+--------+
|ID | Table2_FK | Value |
+----+-----------+--------+
|1 | 7 | X |
|2 | 7 | Y |
|3 | 8 | X |
|4 | 8 | Z |
|5 | 9 | W |
|6 | 9 | M |
|5 | 10 | X |
|6 | 10 | Z |
+----+-----------+--------+
I want to make query to get list of Table2_FKs if I pass X and Z in query for Values. In this example 8 and 10 is the result
It can be more than 2 values

You can do this with group by and having:
select table2_fk
from t
where value in ('X', 'Z')
group by table2_fk
having count(*) = 2;
If the values can be duplicated for a key value, then use count(distinct value) = 2. The "2" is the number of values in the IN list.

Try this:
select distinct Table2_FK
from TableA
where value in ('X','Z');

You can use query as below:
Select distinct table2_fk from (
Select *, Ct = count(id) over (partition by table2_fk) from yourtable
) a
Where a.[Value] in ('X','Z') and a.Ct >= 2

you can use a query like below
select
distinct Table2_FK
from TableA a
where exists (
select 1 1 from TableA b where b.value ='X' and a.Table2_FK =b.Table2_FK
)
and exists (
select 1 1 from TableA c where c.value ='Z' and a.Table2_FK =c.Table2_FK
)

selecting one duplicate from re-occurances with only one varying colum SQL

Current State
id | val | varchar_id| uid
----------------------
1 | 1 | A4D NEWID()
1 | 2 | A3G NEWID()
2 | 1 | 7S3 NEWID()
2 | 1 | 43E NEWID()
2 | 2 | 7S3 NEWID()
2 | 2 | 431 NEWID()
3 | 1 | 432 NEWID()
3 | 2 | 43P NEWID()
Ideal state
id | val | varchar_id|
----------------------
1 | 1 | A4D NEWID()
1 | 2 | A3G NEWID()
2 | 1 | 7S3 NEWID()
2 | 2 | 7S3 NEWID()
3 | 1 | 432 NEWID()
3 | 2 | 43P NEWID()
Removing of duplicate occurrences of id + val
I have tried (pseudo code below):
SELECT *
from table
WHERE uid = MAX
GROUP BY id, val
Does anyone know of a solution to this/ am I missing something here? I do not mind which of the duplicates are returned.
Also, the version of Sybase I am using does not allow Partition x over x,y functionality.

Using SQL you can do it this way. Also your where clause isn't what SQL supports.
DECLARE #T TABLE (ID INT, Val INT, V_ID VARCHAR(50), uidd UNIQUEIDENTIFIER)
INSERT INTO #T VALUES
(1,1,'A4D',NEWID()),
(1,2,'A3G',NEWID()),
(2,1,'7S3',NEWID()),
(2,2,'43E',NEWID()),
(2,2,'7S3',NEWID()),
(2,2,'431',NEWID()),
(3,1,'432',NEWID()),
(3,2,'43P',NEWID())
SELECT t.id, t.Val, MAX(V_ID) AS varchar_id, MAX(uidd)
FROM #T AS t
GROUP BY id, val
ORDER BY id, val
This will give you the result
+---+----+-----------+-------------------------------------+
|id |Val |varchar_id |uid |
+---+----+-----------+-------------------------------------+
|1 |1 |A4D |5296ACE4-573A-4A7E-882F-516EA8E9DBDD |
|1 |2 |A3G |3EE82BEE-8C18-4415-BB3D-110F443409B5 |
|2 |1 |7S3 |68DBF7B3-316D-4A8B-B8AD-8825EC83585D |
|2 |2 |7S3 |01C54277-7156-47E1-9205-DD577A726196 |
|3 |1 |432 |6F53F332-FC9C-4EE1-A3D2-1D0FD002DDAF |
|3 |2 |43P |7B532EBD-E6C9-4BE4-B0F7-FCBCB9CE1D61 |
+---+----+-----------+-------------------------------------+

Multiple selects really needed?

I have the following table.
____________________________________
| carid | changeid | data1 | data2 |
|_______|__________|_______|_______|
| 1 | 1 |a |b |
| 1 | 2 |c |d |
| 1 | 3 |e |f |
| 2 | 3 |g |h |
| 2 | 2 |i |j |
| 2 | 4 |k |l |
| 3 | 5 |m |n |
| 3 | 1 |o |p |
| 4 | 6 |q |r |
| 4 | 2 |s |t |
|_______|__________|_______|_______|
I want to select the following result:
| carid | changeid | data1 | data2 |
|_______|__________|_______|_______|
| 1 | 1 |a |b |
| 1 | 2 |c |d |
| 1 | 3 |e |f |
| 3 | 5 |m |n |
| 3 | 1 |o |p |
|_______|__________|_______|_______|
In words:
If a row has changeid=1 I want to select all the rows with the same carid as the row with changeid=1.
This problem is quite easy to solve with a query using multiple selects. First select all rows with changeid=1 and take those carids and select all rows with those carids. Simple enough.
I was more wondering if it is possible to solve this problem without using multiple selects? Preferably I'm looking for a faster solution but I can try that out myself.

You can join the table back to itself
SELECT DISTINCT a.*
FROM YourTable a
INNER JOIN YourTable b ON b.carid = a.carid and b.changeid = 1
Table a is all the rows you want to output, filtered by table b which limits the set to those with changeid = 1.
This should have excellent performance as everything is done in a set oriented manner.
DISTINCT may not be necessary if changeid 1 may only occur once, and should be avoided if possible as it may introduce a significant performance hit for a large result set.

For multiple select you mean using IN?
SELECT carid, changeid, data1, data2
FROM YourTable
WHERE carid IN (SELECT carid FROM YourTable WHERE changeid = 1)

Most databases support window functions. You can do this as:
select carid, changeid
from (select t.*,
max(case when changeid = 1 then 1 else 0 end) over
(partition by carid) as HasChangeId1
from YourTable t
) t
where HasChangeId1 = 1;
If the "1" is the minimum value for the change id, this can be simplified to:
select carid, changeid
from (select t.*,
min(changeid) over (partition by carid) as MinChangeId
from YourTable t
) t
where MinChangeId = 1;

It sounds like you're after only the combinations of carid and changeid present in the table, in which case the DISTINCT will return only the unique combinations for you. Not sure if that is what you're after but give it a go and check it for your expected behaviour...
SELECT DISTINCT CARID, CHANGEID FROM UnknownTable

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL: How to select distinct on some columns - sql

Related

Return top 10 values from each combination of codes from two columns in SQL

SQL query to split and keep only the top N values

How to make MS SQL select on this

selecting one duplicate from re-occurances with only one varying colum SQL

Multiple selects really needed?

Categories

Resources