An equivalent expression for MIN_BY in SQL? - sql

In SQL we have the function MIN_BY(B,C), which returns the value of B at the minimum of C.
How would one get the same functionality, but without using the MIN_BY function?
i.e. given columns A,B,C, I want to group by A and return the value of B that corresponds to the minimum of C. I can see there must be some way to do it using OVER and PARTITION BY but am not well versed in enough to see how!

One method uses window functions:
select a, min(min_bc)
from (select t.*, min(b) over (partition by a order by c) as min_bc
from t
)
group by a;

Just to understand:
SETUP:
create table Test (a int, b int, c int);
insert into test values(1,2,3);
insert into test values(1,3,2);
insert into test values(1,4,1);
insert into test values(2,4,5);
insert into test values(2,8,4);
QUERY(min(b) for the case of multiple rows with minimum of c):
select a, min(b) from Test t
where c = (select min(c) from Test b where b.a = t.a)
group by a
RESULT:
A MIN(B)
1 4
2 8
RESULT of Gordon Linoffs query:
A MIN(MIN_BC)
1 2
2 4
Who's right, who's wrong and why

If you don't want to use subquery or window functions, I wrote an alternative here (Works best for INT types as the comparator):
https://stackoverflow.com/a/75289042/1997873

Related

SELECT VALUES in Teradata

I know that it's possible in other SQL flavors (T-SQL) to "select" provided data without a table. Like:
SELECT *
FROM (VALUES (1,2), (3,4)) tbl
How can I do this using Teradata?
Teradata has strange syntax for this:
select t.*
from (select * from (select 1 as a, 2 as b) x
union all
select * from (select 3 as a, 4 as b) x
) t;
I don't have access to a TD system to test, but you might be able to remove one of the nested SELECTs from the answer above:
select x.*
from (
select 1 as a, 2 as b
union all
select 3 as a, 4 as b
) x
If you need to generate some random rows, you can always do a SELECT from a system table, like sys_calendar.calendar:
SELECT 1, 2
FROM sys_calendar.calendar
SAMPLE 10;
Updated example:
SELECT TOP 1000 -- Limit to 1000 rows (you can use SAMPLE too)
ROW_NUMBER() OVER() MyNum, -- Sequential numbering
MyNum MOD 7, -- Modulo operator
RANDOM(1,1000), -- Random number between 1,1000
HASHROW(MyNum) -- Rowhash value of given column(s)
FROM sys_calendar.calendar; -- Use as table to source rows
A couple notes:
make sure you pick a system table that will always be present and have rows
if you need more rows than are available in the source table, do a UNION to get more rows
you can always easily create a one-column table and populate it to whatever number of rows you want by INSERT/SELECT into it:
CREATE DummyTable (c1 INT); -- Create table
INSERT INTO DummyTable(1); -- Seed table
INSERT INTO DummyTable SELECT * FROM DummyTable; -- Run this to duplicate rows as many times are you want
Then use this table to create whatever resultset you want, similar to the query above with sys_calendar.calendar.
I don't have a TD system to test so you might get syntax errors...but that should give you a basic idea.
I am a bit late to this thread, but recently got the same error.
I solved this by simply using
select distinct 1 as a, 2 as b from DBC.tables
union all
select distinct 3 as a, 4 as b from DBC.tables
Here, DBC.tables is a DB backend table with a few rows only. So, the query runs fast as well

Insert with if condition

I need to create a sql statement that does this work: let's say I have two tables A and B containing integer fields. What I have to achieve is:
if (!(C is contained into A)) insert C into B
I'm using SQLite. Thanks for help
Actually, in your specific case it may happen to be as easy as
insert into B (c_value) select c_value from A where c_value = #your_c_value_here
see INSERT statement
sorry I haven't noticed the negation in your question for the C in A condition
I have another option for you
with temp_val as (select #your_val_goes_here as val)
insert into b
select val from temp_val where not exists
(select 1 from a where c = val)
check out this fiddle

SQL Join on sequence number

I have 2 tables (A, B). They each have a different column that is basically an order or a sequence number. Table A has 'Sequence' and the values range from 0 to 5. Table B has 'Index' and the values are 16740, 16744, 16759, 16828, 16838, and 16990. Unfortunately I do not know the significance of these values. But I do believe they will always match in sequential order. I want to join these tables on these numbers where 0 = 16740, 1 = 16744, etc. Any ideas?
Thanks
You could use a case expression to convert table a's values to table b's values (or vise-versa) and join on that:
SELECT *
FROM a
JOIN b ON a.[sequence] = CASE b.[index] WHEN 16740 THEN 0
WHEN 16744 THEN 1
WHEN 16759 THEN 2
WHEN 16828 THEN 3
WHEN 16838 THEN 4
WHEN 16990 THEN 5
ELSE NULL
END;
#Mureinik has a great example. If down the road you do end up adding more numbers maybe putting this information into a new table would be a good idea.
CREATE TABLE C(
AInfo INT,
BInfo INT
)
INSERT INTO TABLE C(AInfo,BInfo) VALUES(0,16740)
INSERT INTO TABLE C(AInfo,BInfo) VALUES(1,16744)
etc
Then you can Join all the tables.
If the values are in ascending order as per your example, you can use the ROW_NUMBER() function to achieve this:
;with cte AS (SELECT *, ROW_NUMBER() OVER(ORDER BY [Index])-1 RN
FROM B)
SELECT *
FROM cte

SQL Query combine two rows into one row of data

I have a table (see the image below --red box). It describes the content of my table (A, B, C, and D) are the columns. The data structure will always be like this, if col A is Type_1, only col B has a content while if Col A is Type_2, Col C and D has contents while col B is NULL.
Now, the table which re enclosed with green box is my desired output.
My experience on building a select statement is not very extensive and I'm almost leaning towards creating two separate tables to get my desired result (like 1 table for Type_1 data only and another table for Type_2 data only).
Question is, is it possible to query two rows and combine it to become a single output result using SELECT query? Considering that these two rows are on the same table?
Thanks.
Something like this:
SELECT
Table2Id,
MAX(B) B,
MAX(C) C,
MAX(D) D
FROM tbl
WHERE A != 'Type_3'
GROUP BY Table2Id
Assuming that there is only one row of data for type1 and one row of data for type 2, you can use the following:
SELECT Id, MAX(B) AS B, MAX(C) AS C, MAX(D) AS D
FROM Table2
WHERE A IN ('Type_1','Type_2')
GROUP BY Id
Example in this SQL Fiddle
You can make subqueries by enclosing them in parenthesis. As in:
SELECT (SELECT TOP 1 B FROM table ORDER BY some_ordering), (SELECT TOP 1 C FROM table WHERE NOT C IS NULL), D FROM table
The queries inside the parenthesis can apply to any table, and can use the data from the main query in calculations of the selected values and in filters.

How to select only one full row per group in a "group by" query?

In SQL Server, I have a table where a column A stores some data. This data can contain duplicates (ie. two or more rows will have the same value for the column A).
I can easily find the duplicates by doing:
select A, count(A) as CountDuplicates
from TableName
group by A having (count(A) > 1)
Now, I want to retrieve the values of other columns, let's say B and C. Of course, those B and C values can be different even for the rows sharing the same A value, but it doesn't matter for me. I just want any B value and any C one, the first, the last or the random one.
If I had a small table and one or two columns to retrieve, I would do something like:
select A, count(A) as CountDuplicates, (
select top 1 child.B from TableName as child where child.A = base.A) as B
)
from TableName as base group by A having (count(A) > 1)
The problem is that I have much more rows to get, and the table is quite big, so having several children selects will have a high performance cost.
So, is there a less ugly pure SQL solution to do this?
Not sure if my question is clear enough, so I give an example based on AdventureWorks database. Let's say I want to list available States, and for each State, get its code, a city (any city) and an address (any address). The easiest, and the most inefficient way to do it would be:
var q = from c in data.StateProvinces select new { c.StateProvinceCode, c.Addresses.First().City, c.Addresses.First().AddressLine1 };
in LINQ-to-SQL and will do two selects for each of 181 States, so 363 selects. I my case, I am searching for a way to have a maximum of 182 selects.
The ROW_NUMBER function in a CTE is the way to do this. For example:
DECLARE #mytab TABLE (A INT, B INT, C INT)
INSERT INTO #mytab ( A, B, C ) VALUES (1, 1, 1)
INSERT INTO #mytab ( A, B, C ) VALUES (1, 1, 2)
INSERT INTO #mytab ( A, B, C ) VALUES (1, 2, 1)
INSERT INTO #mytab ( A, B, C ) VALUES (1, 3, 1)
INSERT INTO #mytab ( A, B, C ) VALUES (2, 2, 2)
INSERT INTO #mytab ( A, B, C ) VALUES (3, 3, 1)
INSERT INTO #mytab ( A, B, C ) VALUES (3, 3, 2)
INSERT INTO #mytab ( A, B, C ) VALUES (3, 3, 3)
;WITH numbered AS
(
SELECT *, rn=ROW_NUMBER() OVER (PARTITION BY A ORDER BY B, C)
FROM #mytab AS m
)
SELECT *
FROM numbered
WHERE rn=1
As I mentioned in my comment to HLGEM and Philip Kelley, their simple use of an aggregate function does not necessarily return one "solid" record for each A group; instead, it may return column values from many separate rows, all stitched together as if they were a single record. For example, if this were a PERSON table, with the PersonID being the "A" column, and distinct contact records (say, Home and Word), you might wind up returning the person's home city, but their office ZIP code -- and that's clearly asking for trouble.
The use of the ROW_NUMBER, in conjunction with a CTE here, is a little difficult to get used to at first because the syntax is awkward. But it's becoming a pretty common pattern, so it's good to get to know it.
In my sample I've define a CTE that tacks on an extra column rn (standing for "row number") to the table, that itself groups by the A column. A SELECT on that result, filtering to only those having a row number of 1 (i.e., the first record found for that value of A), returns a "solid" record for each A group -- in my example above, you'd be certain to get either the Work or Home address, but not elements of both mixed together.
It concerns me that you want any old value for fields b and c. If they are to be meaningless why are you returning them?
If it truly doesn't matter (and I honestly can't imagine a case where I would ever want this, but it's what you said) and the values for b and c don't even have to be from the same record, group by with the use of mon or max is the way to go. It's more complicated if you want the values for a particular record for all fields.
select A, count(A) as CountDuplicates, min(B) as B , min(C) as C
from TableName as base
group by A
having (count(A) > 1)
you can do some thing like this if you have id as primary key in your table
select id,b,c from tablename
inner join
(
select id, count(A) as CountDuplicates
from TableName as base group by A,id having (count(A) > 1)
)d on tablename.id= d.id