Limited By Value Postgresql query - sql

I have a couple of rows where values in one column are repeating and I need to get a couple of rows where every value is limited by const.
For example, i have this rows (1, 'a') (2, 'b') (3, 'a') (4,'c') (5, 'b') (6, 'a') and i limited every value in select by 2. Then I should not get a row with ID 6 cause this is an extra row cause I limited them by 2.
How I can do that?
thx for any help

If you have just two columns, say id and val, and you want just one row per value, then aggregation is enough:
select min(id) as id, val
from mytable
group by val
If there are more columns, you can use distinct on:
select distinct on (val) t.*
from mytable
order by val, id
Finally, if you want to be able to allow a variable number of rows per val, you can use window functions. Say you want 3 rows maximum per value:
select *
from (
select t.*, row_number() over(partition by val order by id) rn
from mytable t
) t
where rn <= 3

Related

How to filter DISTINCT records and ordering them using LISTAGG function

SELECT
s_id
,CASE WHEN LISTAGG(X.item_id, ',') WITHIN GROUP (ORDER BY TRY_TO_NUMBER(Z.item_pg_nbr))= '' THEN NULL
ELSE LISTAGG (X.item_id, ',') WITHIN GROUP (ORDER BY TRY_TO_NUMBER(Z.item_pg_nbr))
END AS item_id_txt
FROM table_1 X
JOIN table_2 Z
ON Z.cmn_id = X.cmn_id
WHERE s_id IN('38301','40228')
GROUP BY s_id;
When I run the above query, I'm getting the same values repeated for ITEM_ID_TXT column. I want to display only the DISTINCT values.
S_ID ITEM_ID_TXT
38301 618444,618444,618444,618444,618444,618444,36184
40228 616162,616162,616162,616162,616162,616162,616162
I also want the concatenated values to be ordered by item_pg_nbr
I can use DISTINCT in the LISTAGG function but that won't give the result ordered by item_pg_nbr.
Need your inputs on this.
Since you cannot use different columns for the distinct and order by within group, one approach would be:
1 Deduplicate while grabbing the minimum item_pg_nbr.
2 listagg and order by the minimum item_pg_nbr.
create or replace table T1(S_ID int, ITEM_ID int, ITEM_PG_NBR int);
insert into T1 (S_ID, ITEM_ID, ITEM_PG_NBR) values
(1, 1, 3),
(1, 2, 9), -- Adding a non-distinct ITEM_ID within group
(1, 2, 2),
(1, 3, 1),
(2, 1, 1),
(2, 2, 2),
(2, 3, 3);
with X as
(
select S_ID, ITEM_ID, min(ITEM_PG_NBR) MIN_PG_NBR
from T1 group by S_ID, ITEM_ID
)
select S_ID, listagg(ITEM_ID, ',') within group (order by MIN_PG_NBR)
from X group by S_ID
;
I guess the question then becomes what happens when you have duplicates within group? It would seem logical that the minimum item_pg_nbr should be used for the order by, but you could just as easily use the max or some other value.

Make value from every second row appear in new 3rd column

Lets assume my data looks like this :
Every second row represents old (previous value) in a table that holds historical data.
table 1 :
id value
------------
1 a
1 b
2 c
2 d
3 a
3 b
and i want to get value of every second row to appear in new 3rd column like this :
table 2:
id new_value old_value
------------------------
1 a b
2 c d
3 a b
EDIT:
For clarity ill post the skeleton of query thats producing data i want to transform (so its clear i am already using WITH so cant use additional one due to oracle not yet allowing nesting of WITH elements) :
skeleton code that produces data in table 1 :
with candidates as
(
--select list of candidates
)
SELECT * FROM
(
(
--select new values
MINUS
--select old values
)
UNION
(
--select old values
MINUS
--select new values
)
)
ORDER BY id;
The goal is to finally get only a list of ids that changed with their old and new values.
Thanks in advance.
Use CTE
;WITH CTE AS(
SELECT *, ROW_NUMBER() OVER (PARTITION BY ID ORDER BY ID) RN
FROM TableName
)
SELECT ID,
MIN(CASE WHEN RN=1 THEN [value] END) NewValue,
MIN(CASE WHEN RN=2 THEN [value] END) OldValue
FROM CTE
GROUP BY ID
It is quite possible that overall query can be written in a much simpler way. Just join intermediary results with old and new values together on id to put them in two different columns instead of unioning them into the same column.
WITH
candidates
AS
(
--select list of candidates
)
,CTE_NewValues
AS
(
--select new values
select id, value AS new_value
FROM candidates
WHERE ...
-- assumes id is unique, one row per id
)
,CTE_OldValues
AS
(
--select old values
select id, value AS old_value
FROM candidates
WHERE ...
-- assumes id is unique, one row per id
)
SELECT
CTE_NewValues.id
,CTE_NewValues.new_value
,CTE_OldValues.old_value
FROM
CTE_NewValues
INNER JOIN CTE_OldValues ON CTE_NewValues.id = CTE_OldValues.id
WHERE
CTE_NewValues.new_value <> CTE_OldValues.old_value
ORDER BY
CTE_NewValues.id;
If we stick to the skeleton of the query in the question, there are also many ways to do it. Self-join is likely to be less efficient than using analytic functions, like ROW_NUMBER and LEAD.
Sorting just by id is not enough to unambiguously define which value is new or old. You need to have some extra column to resolve it.
You don't "nest" WITH (common-table expressions), you "chain" them. Something like the following. As you do that, make sure to add the sort_order column to be able to distinguish old and new values, if you don't have a similar column already.
WITH
candidates
AS
(
--select list of candidates
)
,CTE_YourQuery
AS
(
SELECT * FROM
(
(
--select new values
select 1 AS sort_order, id, value
MINUS
--select old values
select 1 AS sort_order, id, value
)
UNION ALL
(
--select old values
select 2 AS sort_order, id, value
MINUS
--select new values
select 2 AS sort_order, id, value
)
)
)
,CTE_RowNumber
AS
(
SELECT
id
,value AS new_value
,ROW_NUMBER() OVER (PARTITION BY id ORDER BY sort_order) AS rn
,LEAD(value) OVER (PARTITION BY id ORDER BY sort_order) AS old_value
FROM CTE_YourQuery
)
SELECT
id
,new_value
,old_value
FROM CTE_RowNumber
WHERE rn = 1
ORDER BY id;
Assuming there is some other column which defines the "order" in which the new and old value appears, you can do this:
select t1.id, t1.value as old_value, t2.value as new_value
from the_table t1
join the_table t2 on t1.id = t2.id and t1.sort_order < t2.sort_order
But you have to have some column that distinguishes the row that is considered "old" from the one that is considered "new".

Is it possible to ignore NULL values when using LAG() and LEAD() functions in SQL Server?

As you know the LAG() & LEAD() analytic functions access data from a previous and next row in the same result set without the use of a self-join. But is it possible to ignore NULL values until access to a NOT NULL value?
Its possible using window functions.
Have a read of this article by Itzik Ben-Gan for more details.
In the code below, the cte is getting the most recent NOT NULL id value, then the next select gets the actual column value. This example uses LAG.
eg.
-- DDL for T1
SET NOCOUNT ON;
USE tempdb;
IF OBJECT_ID(N'dbo.T1', N'U') IS NOT NULL DROP TABLE dbo.T1;
GO
CREATE TABLE dbo.T1
(
id INT NOT NULL CONSTRAINT PK_T1 PRIMARY KEY,
col1 INT NULL
);
-- Small set of sample data
TRUNCATE TABLE dbo.T1;
INSERT INTO dbo.T1(id, col1) VALUES
( 2, NULL),
( 3, 10),
( 5, -1),
( 7, NULL),
(11, NULL),
(13, -12),
(17, NULL),
(19, NULL),
(23, 1759);
;WITH C AS
(
SELECT
id,
col1,
MAX(CASE WHEN col1 IS NOT NULL THEN id END) OVER(ORDER BY id ROWS UNBOUNDED PRECEDING) AS grp
FROM dbo.T1
)
SELECT
id,
col1,
(SELECT col1 FROM dbo.T1 WHERE id = grp) lastval
FROM C;
Oracle 11 supports the option ignore nulls which does exactly what you want. Of course, your question is about SQL Server, but sometimes it is heartening to know that the functionality does exist somewhere.
It is possible to simulate this functionality. The idea is to assign null values to a group, based on the preceding value. In essence, this is counting the number of non-null values before it. You can do this with a correlated subquery. Or, more interestingly, with the difference of two row numbers. Then within the group, you can just use max().
I think the following does what you want. Assume that col contains NULL values and ordering has the ordering for the rows:
select t.*,
max(col) over (partition by grp) as LagOnNull
from (select t.*,
(row_number() over (order by ordering) -
row_number() over (partition by col order by ordering)
) as grp
from table t
) t;
The lead() is similar but the ordering is reversed. And, this will work with additional partitioning keys, but you need to add them to all the window expressions.
LEAD(IIF(col1 IS NULL,NULL, col1),1) OVER PARTITION BY (ISNULL(col1))

Subtract one table from another

I have two tables with two different select statements. These tables contain only one column. I would like to subtract the rows from table2 from rows in table1 only once. In other words: I would like to remove only one occurence, not all.
table1:
apple
apple
orange
table2:
apple
pear
result:
apple
orange
Basically FYI If A={A,A,O},B={A,P} then A-B is logically
select * from t1 except select * from t2
try this !
create table #t(id varchar(10))
create table #t1(id1 varchar(10))
insert into #t values('apple'),('apple'),('orange')
insert into #t1 values('apple'),('pear')
select * from
(
select *,rn=row_number()over(partition by id order by id) from #t
except
select *,rn1=row_number()over(partition by id1 order by id1) from #t1
)x
SEE DEMO
Here is an answer for an Oracle dbms. The trick is to number records per fruit, so to get apple 1, apple 2, etc. Then subtract the sets to stay with apple 2 whereas apple 1 was removed for instance. (The row_number function needs a sort order which is not important for us, but we must specify it for syntax reasons.)
select fruit
from
(
select fruit, row_number() over (partition by fruit order by fruit)
from table1
minus
select fruit, row_number() over (partition by fruit order by fruit)
from table2
);
I don't think you would be able to delete the records through a single SQL query:
According to me, you will need to connect the database to a programming language and will need to write an algorithm like this:
for(all records in table2)
{
if(record present in table1)
{
delete from table1 where record = (record in table2) limit 1;
}
}

Select DISTINCT, return entire row

I have a table with 10 columns.
I want to return all rows for which Col006 is distinct, but return all columns...
How can I do this?
if column 6 appears like this:
| Column 6 |
| item1 |
| item1 |
| item2 |
| item1 |
I want to return two rows, one of the records with item1 and the other with item2, along with all other columns.
In SQL Server 2005 and above:
;WITH q AS
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY col6 ORDER BY id) rn
FROM mytable
)
SELECT *
FROM q
WHERE rn = 1
In SQL Server 2000, provided that you have a primary key column:
SELECT mt.*
FROM (
SELECT DISTINCT col6
FROM mytable
) mto
JOIN mytable mt
ON mt.id =
(
SELECT TOP 1 id
FROM mytable mti
WHERE mti.col6 = mto.col6
-- ORDER BY
-- id
-- Uncomment the lines above if the order matters
)
Update:
Check your database version and compatibility level:
SELECT ##VERSION
SELECT COMPATIBILITY_LEVEL
FROM sys.databases
WHERE name = DB_NAME()
The key word "DISTINCT" in SQL has the meaning of "unique value". When applied to a column in a query it will return as many rows from the result set as there are unique, different values for that column. As a consequence it creates a grouped result set, and values of other columns are random unless defined by other functions (such as max, min, average, etc.)
If you meant to say you want to return all rows for which Col006 has a specific value, then use the "where Col006 = value" clause.
If you meant to say you want to return all rows for which Col006 is different from all other values of Col006, then you still need to specify what that value is => see above.
If you want to say that the value of Col006 can only be evaluated once all rows have been retrieved, then use the "having Col006 = value" clause. This has the same effect as the "where" clause, but "where" gets applied when rows are retrieved from the raw tables, whereas "having" is applied once all other calculations have been made (i.e. aggregation functions have been run etc.) and just before the result set is returned to the user.
UPDATE:
After having seen your edit, I have to point out that if you use any of the other suggestions, you will end up with random values in all other 9 columns for the row that contains the value "item1" in Col006, due to the constraint further up in my post.
You can group on Col006 to get the distinct values, but then you have to decide what to do with the multiple records in each group.
You can use aggregates to pick a value from the records. Example:
select Col006, min(Col001), max(Col002)
from TheTable
group by Col006
order by Col006
If you want the values to come from a specific record in each group, you have to identify it somehow. Example of using Col002 to identify the record in each group:
select Col006, Col001, Col002
from TheTable t
inner join (
select Col006, min(Col002)
from TheTable
group by Col006
) x on t.Col006 = x.Col006 and t.Col002 = x.Col002
order by Col006
SELECT *
FROM (SELECT DISTINCT YourDistinctField FROM YourTable) AS A
CROSS APPLY
( SELECT TOP 1 * FROM YourTable B
WHERE B.YourDistinctField = A.YourDistinctField ) AS NewTableName
I tried the answers posted above with no luck... but this does the trick!
select * from yourTable where column6 in (select distinct column6 from yourTable);
SELECT *
FROM harvest
GROUP BY estimated_total;
You can use GROUP BY and MIN() to get more specific result.
Lets say that you have id as the primary_key.
And we want to get all the DISTINCT values for a column lets say estimated_total, And you also need one sample of complete row with each distinct value in SQL. Following query should do the trick.
SELECT *, min(id)
FROM harvest
GROUP BY estimated_total;
create table #temp
(C1 TINYINT,
C2 TINYINT,
C3 TINYINT,
C4 TINYINT,
C5 TINYINT,
C6 TINYINT)
INSERT INTO #temp
SELECT 1,1,1,1,1,6
UNION ALL SELECT 1,1,1,1,1,6
UNION ALL SELECT 3,1,1,1,1,3
UNION ALL SELECT 4,2,1,1,1,6
SELECT * FROM #temp
SELECT *
FROM(
SELECT ROW_NUMBER() OVER (PARTITION BY C6 Order by C1) ID,* FROM #temp
)T
WHERE ID = 1