Minimum/Maximum function in T-SQL? - sql

I am not asking about the aggregate Min/Max functions here. I would like to know if there are functions to get the mix or max of two values as in:
SELECT Maximum(a,b)
FROM Foo
If table Foo contains
a b
1 2
4 3
Then the results should be 2, then 4.
I can do this with an IF or CASE statement, but you'd think there would be some simple math functions for this.
Thank you,
Daniel

There is not. You can write your own UDFs but UDFs can slow queries down. Another option is to UNPIVOT the data so you can use the aggregate function. But for small applications CASE is best.

Related

SQL aggregation for identical values

Suppose I have data something like this
id
project-id
thing-count
country
1
1
4
GBR
2
1
2
GBR
3
1
8
GBR
4
2
1
USA
5
2
4
USA
6
2
9
USA
I want to group the data using the project-id and keep the country. I know that the country does not vary within a project. There seem to be two ways I can do this:
SELECT
project-id,
MIN(country) AS country
FROM data
GROUP BY project-id
or
SELECT
project-id,
country
FROM data
GROUP BY
project-id,
country
These both work, but neither seem right. The first puts an extra burden on the GROUP BY since there's an unnecessary MIN calculation, while the second suggests to anyone reading the query that I want to GROUP BY the country data.
I'm always surprised that there is no FIRST, LAST, or ANY aggregation function, but as far as I can tell neither SQL Server, MySQL, nor Postgres have that.
How can I write his query so that the GROUP BY does not need to do extra work aggregating a column whose entries are identical, in a way that makes it obvious in the SQL that I do not care which of the values being aggregated is chosen to represent the set of values?
Two errors in my original question: FIRST and LAST. I need to constantly remind myself that SQL rows have no implicit ordering (think sets not lists) and so these aggregate functions would not make sense. But ANY does make sense, and I have expected it to be available time and time again.
In a comment on the question #amir-saleem points out that this does exist in MySQL. Using the ANY_VALUE aggregation function I could write
SELECT project-id, ANY_VALUE(country) as country, FROM data GROUP BY project-id
(N.B. ANY_VALUE looks like a useful aggregation function, but it is included in the Miscellaneous Functions documentation rather than the Aggregate Functions documentation; I do not know why.)
There is no similar aggregate function in Postgres as far as I can tell, though I could use a custom aggregate function to achieve this. Here are some community provided examples that would work in my case:
First/last (aggregate)
Aggregate_Random
(I have not investigated SQL Server based solutions.)
If I'm understanding correctly, you basically want a list of all project IDs and their associated countries?
If so, you can do some by simply adding a distinct clause after select as shown below
Select distinct project-id,country from data

Generate dynamic date columns in a SELECT query SQL

First of I've got a table like this:
vID
bID
date
type
value
1
100
22.01.2021
o
250.00
1
110
25.01.2021
c
100.00
2
120
13.02.2021
o
400.00
3
130
20.02.2021
o
475.00
3
140
11.03.2022
c
75.00
1
150
15.03.2022
o
560.00
To show which values were ordered(o) and charged(c) per Month, I have to like 'generate' columns for each month both ordered and charged in a MSSQL SELECT query.
Here is an example table of what I want to get:
vID
JAN2021O
JAN2021C
FEB2021O
FEB2021C
…
MAR2022O
MAR2022C
1
250.00
100.00
560.00
2
400.00
3
475.00
75.00
I need a posibility to join it in a SQL SELECT in addition to some other columns I already have.
Does anyone has an idea and could help me please?
The SQL language has a very strict requirement to know the number of columns in the results and the type of each column at query compile time, before looking at any data in the tables. This applies even to SELECT * and PIVOT queries, where the columns are still determined at query compile time via the table definition (not data) or SQL statement.
Therefore, what you want to do is only possible in a single query if you want to show a specific, known number of months from a base date. In that case, you can accomplish this by specifying each column in the SQL and using date math with conditional aggregation to figure the value for each of the months from your starting point. The PIVOT keyword can help reduce the code, but you're still specifying every column by hand, and the query will still be far from trivial.
If you do not have a specific, known number of months to evaluate, you must do this over several steps:
Run a query to find out how many months you have.
Use the result from step 1 to dynamically construct a new statement
Run the statement constructed in step 2.
There is no other way.
Even then, this kind of pivot is usually better handled in the client code or reporting tool (at the presentation level) than via SQL itself.
It's not as likely to come up for this specific query, but you should also be aware there are certain security issues that can be raised from this kind of dynamic SQL, because some of the normal mechanisms to protect against injection issues aren't available (you can't parameterize the names of the source columns, which are dependent on data that might be user-generated) as you build the new query in step 2.

How to get all combinations (ordered sampling without replacement) in regex

I'm trying to match a comma-separated string of numbers to a certain pattern within an sql query. I used regular expressions for similar problems in the past successfully, so I'm trying to get them working here as well. The problem is as follows:
The string may contain any number in a range (e.g. 1-4) exactly 0-1 times.
Two numbers are comma-separated
The numbers have to be in ascending order
(I think this is kind of a case of ordered sampling without replacement)
Sticking with the example of 1-4, the following entries should match:
1
1,2
1,3
1,4
1,2,3
1,2,4
1,3,4
1,2,3,4
2
2,3
2,4
3
3,4
4
and these should not:
q dawda 323123 a3 a1 1aa,1234 4321 a4,32,1a 1112222334411
1,,2,33,444, 11,12,a 234 2,2,3 33 3,3,3 3,34 34 123 1,4,4,4a 1,444
The best try I currently have is:
\b[1-4][\,]?[2-4]?[\,]?[3-4]?[\,]?[4]?\b
This still has two major drawbacks:
It delivers quite a lot of false positives. Numbers are not eliminated after they occurred once.
It will get rather long, when the range of numbers increases, e.g. 1-18 is already possible as well, bigger ranges are thinkable of.
I used regexpal for testing purposes.
Side notes:
As I'm using sql it would be possible to implement some algorithm in another language to generate all the possible combinations and save them in a table that can be used for joining, see e.g. How to get all possible combinations of a list’s elements?. I would like to only rely on that as a last resort, as the creation of new tables will be involved and these will contain a lot of entries.
The resulting sql statement that uses the regex should run on both Postgres and Oracle.
The set of positive examples is also referred to as "powerset".
Edit: Clarified the list of positive examples
I wouldn't use Regex for this, as e.g. the requirements "have to be unique" and "have to be in ascending order" can't really be expressed with a regular expression (at least I can't think of a way to do that).
As you also need to have an expression that is identical in Postgres and Oracle, I would create a function that checks such a list and then hide the DBMS specific implementation in that function.
For Postgres I would use its array handling features to implement that function:
create or replace function is_valid(p_input text)
returns boolean
as
$$
select coalesce(array_agg(x order by x) = string_to_array(p_input, ','), false)
from (
select distinct x
from unnest(string_to_array(p_input,',')) as t(x)
where x ~ '^[0-9]+$' -- only numbers
) t
where x::int between 1 and 4 -- the cast is safe as the inner query only returns valid numbers
$$
language sql;
The inner query returns all (distinct) elements from the input list as individual numbers. The outer query then aggregates that back for values in the desired range and numeric order. If that result isn't the same as the input, the input isn't valid.
Then with the following sample data:
with sample_data (input) as (
values
('1'),
('1,2'),
('1,3'),
('1,4'),
('1,2,3'),
('1,2,4'),
('foo'),
('1aa,1234'),
('1,,2,33,444,')
)
select input, is_valid(input)
from sample_data;
It will return:
input | is_valid
-------------+---------
1 | true
1,2 | true
1,3 | true
1,4 | true
1,2,3 | true
1,2,4 | true
foo | false
1aa,1234 | false
1,,2,33,444, | false
If you want to use the same function in Postgres and Oracle you probably need to use returns integer in Postgres as Oracle still doesn't support a boolean data type in SQL
Oracle's string processing functions are less powerful than Postgres' functions (e.g. no string_to_array or unnest), but you can probably implement a similar logic in PL/SQL as well (albeit more complicated)

Dynamic use of MDX AVG function

Anyone have advice on how to build an average measure that is dynamic -- it doesn't specify a particular slice but instead uses your current view? I'm working within a front-end OLAP viewer (Strategy Companion) and I need a "dynamic" implementation based on the dimensions that are currently filtered in the data view.
My fact table looks something like this:
Key AmountA IndicatorA AmountB Other Data
1 5 1 null 25
2 6 1 null 52
3 7 1 2 106
4 null 0 4 108
Now I can specify a simple average for "[Measures].[AmountA]" with "[Measures].[AmountA] / [Measures].[IndicatorA]" which works great - "[IndicatorA]" sums up to the number of non-null values of "[AmountA]". And this also works great no matter what dimensions are selected in the view - it always divides by the count of rows that have been filtered in.
But what about [AmountB]? I don't have a null indicator column. I want to get an average value of [AmountB] for whatever rows have been filtered in for my current view. If I try to use the count of rows as a simple formula (psuedo-code "[Measures].[AmountB] / Count([Measures].[Key])") I get the wrong result, because it is counting all the null rows in the average.
So, I need a way to use the AVG function to specify the average of [AmountB] over the set of "whatever rows I'm currently filtering in, based on whatever dimensions I'm currently using". How do I specify this dynamic set?
I've tried several different uses of the AVG function and they have either returned null or summed up to huge numbers, clearly not the average I'm looking for.
Thanks-
Matt
Sorry, my first suggestion was wrong. If you don't have access to OLAP cube you can't write any mdx-query for this purpose (IMHO). Because, you don't have any detailed data (from your fact table) in this access level and you can use only aggregated data and dimensions from your cube.
Otherwise (if you have access to olap db), you can create this metric (count of not NULL rows) in your measure group and after that use it for AVG calculation (as calculated member in your cube or in section "WITH" in your mdx-query).

A simple SQL Select query to crawl all connected people in a social graph?

What is the shortest or fastest SQL select query or SQL procedure to crawl a social graph. Imagine we have this table:
UId FriendId
1 2
2 1
2 4
1 3
5 7
7 5
7 8
5 9
9 7
We have two subset of people here, i'm talking about a sql query or procedure which if we pass:
Uid = 4 return the result set rows with uid : {1, 2, 3}
or if
Uid = 9 return the result set rows with uid : {5, 7, 8}
Sorry for my poor english.
So you want get all friends of someone, including n-th degree friends? I don't think it is possible without recursion.
How you can do that is explained here:
https://inviqa.com/blog/graphs-database-sql-meets-social-network
If you are storing your values in an adjacency list, the easiest way I've found to crawl it is to translate it into a graphing language and query that. For example, if you were working in PHP, you could use the Image_GraphViz package. Or, if you want to use AJAX, you might consider cytoscapeweb. Both work well.
In either case, you'd SELECT * FROM mytable and feed all the records into the graph package as nodes. This means outputting them in dot or GraphML (or other graphing language). Then you can easily query them.
If you don't wish to translate the dataset, consider storing it as nested sets. Nested sets, though a bit of a pain to maintain, are much better than adjacency lists for the kind of queries you are looking to do.
If you are storing your values in an adjacency list, and you want n-th degree you can simply recursively INNER JOIN the UID's. For example:
Select t1.uid, t2.uid, t3.uid FROM t1 INNER JOIN t2 ON t1.uid=t2.uid INNER JOIN t3 ON t2.uid=t3.uid
This query is like a DFS with a fixed depth.