Is there an equivalent of Excel's PERCENTRANK (column,value) in SQL? I think native SQL Server functions provide only percentiles from the same column, but am I missing something?
The problem is that I have two columns - A and B - and I need to use distribution of values from column A and then take values from another column and rank them against column A - how do values in B fit in distribution defined by column A (or, precisely: get CDF of column A at point defined by B).
Possible solutions:
Write UDF (they don't parallelize, do they?) - can you answer how could I efficiently structure such UDF?
Use SQL Server 2016 and integrate it with R (R code within SQL code)
Copy data to R, perform calculations, send back to server
Copy data to Excel and calculate it over there "manually", import results back to database
There are multiple columns with values and distributions I need to apply this strategy to, so I am looking for an efficient solution.
Edit - example:
column A column B result
10 16 0,20 =PERCENTRANK($A1:$A4, B1)
20 35 0,83 =PERCENTRANK($A1:$A4, B2)
30 10 0,00 =PERCENTRANK($A1:$A4, B3)
40 25 0,50 =PERCENTRANK($A1:$A4, B4)
This could serve as an approximate, not foolproof numerical solution with column being passed as a name:
CREATE FUNCTION [dbo].[PERCENTRANK_column](
#colname nvarchar(50)
)
RETURNS #resultingtable TABLE(
RowID int,
PercentRank float
)
AS
BEGIN
;WITH true_CDF AS (
SELECT
PERCENT_RANK() OVER (PARTITION BY 1 ORDER BY
CASE #colname
WHEN 'X' THEN X
WHEN 'Y' THEN Y
ELSE NULL
END
ASC) as PercentRank
,
CASE #colname
WHEN 'X' THEN X
WHEN 'Y' THEN Y
ELSE NULL
END AS selected_col
FROM dbo.MainTable
)
, selectColumn AS (
SELECT
m.RowID
,CASE #colname
WHEN 'X' THEN to_score_X
WHEN 'Y' THEN to_score_Y
ELSE NULL
END as to_score
FROM dbo.MainTable m
)
, added_Mins AS (
SELECT
B.RowID
, ABS(B.to_score - A.selected_col) as diff
, MIN(ABS(B.to_score - A.selected_col)) OVER (PARTITION BY 1) as lowest_difference
, A.PercentRank
FROM selectColumn B
CROSS JOIN true_CDF A
)
INSERT INTO #resultingtable
SELECT
RowID, PercentRank
FROM added_Mins
WHERE lowest_difference = diff
RETURN
END
Related
Suppose I have 10 columns in my table and I want to update each column but one at a time for each row up to 10 rows.
if table is like
1,2,3
4,5,6
7,8,9
I want to update it like
x,2,3
4,y,6
7,8,z
Columns can be of any count so need dynamic approach. Also sometimes need to exclude some columns.
I tried to see if I can update row based on row id but there is no such option available as row id. I don't wanna change design of table to include a counter column.
you can use window function to assign a a row id and based on that :
with cte as (
select * from (
select * , row_number() over (order by id) rn
from tablename
) t ) ;
update t
set col1 = case when rn = 1 then <updatevalue> else col1 end
, col2 = case when rn = 2 then <updatevalue> else col2 end
, col3 = case when rn = 3 then <updatevalue> else col3 end
, ...
from tablename t
join cte on cte.id = t.id
The requirement "Columns can be of any count so need dynamic approach" looks like as a try to implement matrix as a table.
Alternative approach could be usage of ARRAY type and storing entire structure as single "cell" in the table.
CREATE OR REPLACE TABLE t
AS
SELECT ARRAY_CONSTRUCT(ARRAY_CONSTRUCT(1,2,3),
ARRAY_CONSTRUCT(4,5,6),
ARRAY_CONSTRUCT(7,8,9)) c
UNION ALL
SELECT ARRAY_CONSTRUCT(ARRAY_CONSTRUCT(10,20,30),
ARRAY_CONSTRUCT(40,50,60),
ARRAY_CONSTRUCT(70,80,90)) c;
SELECT *
FROM t;
/*
C
[[1,2,3],[4,5,6],[7,8,9]]
[[10,20,30],[40,50,60],[70,80,90]]
*/
Accessing elements:
SELECT c[0][0], c[0][1], c[0][2],
c[1][0], c[1][1], c[1][2],
c[2][0], c[2][1], c[2][2]
FROM t;
/*
C[0][0] C[0][1] C[0][2] C[1][0] C[1][1] C[1][2] C[2][0] C[2][1] C[2][2]
1 2 3 4 5 6 7 8 9
10 20 30 40 50 60 70 80 90
*/
Update:
UPDATE t
SET c = ARRAY_CONSTRUCT(ARRAY_CONSTRUCT('x' , c[0][1], c[0][2])
,ARRAY_CONSTRUCT(c[1][0], 'y' ,c[1][2])
,ARRAY_CONSTRUCT(c[2][0], c[2][1] , 'z' )
);
SELECT * FROM t;
/*
C
[["x",2,3],[4,"y",6],[7,8,"z"]]
[["x",20,30],[40,"y",60],[70,80,"z"]]
*/
More robust transformations could be performed via user-defined functions.
I'm trying to call this calculated column 'RelativeEffectiveSpreadAbsoluteValue' in SQL servers' FROM part.
, case when cast(sa.Mid_Price as float) = 0
then 0
else ((CAST(sa.Ask_Price as float)-cast(sa.Bid_Price as float))/CAST(sa.Mid_Price as float))/(0.01/100)
end As RelativeEffectiveSpreadAbsoluteValue
like this, but the SQL server won't recognize it
left join [RISK].[dbo].[FILiquidityBuckets] FB6
ON FB6.Metric = 'Relative spread ' AND (
((CAST(RelativeEffectiveSpreadAbsoluteValue AS FLOAT)>= 0 AND CAST(RelativeEffectiveSpreadAbsoluteValue AS FLOAT)< 1000000) AND
FB6.LiquidityScore = 5) OR
((CAST(RelativeEffectiveSpreadAbsoluteValue AS FLOAT)>= 1000000 AND CAST(RelativeEffectiveSpreadAbsoluteValue AS FLOAT)<10000000) AND
FB6.LiquidityScore = 4) OR
((CAST(RelativeEffectiveSpreadAbsoluteValue AS FLOAT)>= 10000000 AND CAST(RelativeEffectiveSpreadAbsoluteValue AS FLOAT)< 100000000) AND
FB6.LiquidityScore = 3) OR
((CAST(RelativeEffectiveSpreadAbsoluteValue AS FLOAT)>= 100000000 AND CAST(RelativeEffectiveSpreadAbsoluteValue AS FLOAT)<1000000000) AND
FB6.LiquidityScore = 2) OR
(CAST(RelativeEffectiveSpreadAbsoluteValue AS FLOAT) >= 1000000000 AND F65.LiquidityScore = 1)
)
So far I know by using 'Cross Apply' a calculated column can calculate another column in the same view,
like this example
Select
ColumnA,
ColumnB,
c.calccolumn1 As calccolumn1,
c.calccolumn1 / ColumnC As calccolumn2
from t42
cross apply (select (ColumnA + ColumnB) as calccolumn1) as c
but this is only for the select part, can we use it in the From part?
Please help thank you!
Put the apply operation which does the calculation prior to the join in your query:
create table t(a int);
create table u(b int);
select t.a,
t2.calculatedColumn,
u.b
from t
cross apply (select t.a * 2) as t2 (calculatedColumn)
left join u on u.b = t2.calculatedColumn
As Panagiotis observed, this may result in a slow join because the join predicate will not be able to use an index. But if the nature of your query demands it, the language supports it.
If you need this to be fast, create an indexed computed column on the table you have aliased as sa instead of calculating it in the query. Since your column will be of type float, you will need to mark the computed column as persisted before you can index it.
I am trying to write a query that will return data sorted by an alphanumeric column, Code.
Below is my query:
SELECT *
FROM <<TableName>>
CROSS APPLY (SELECT PATINDEX('[A-Z, a-z][0-9]%', [Code]),
CHARINDEX('', [Code]) ) ca(PatPos, SpacePos)
CROSS APPLY (SELECT CONVERT(INTEGER, CASE WHEN ca.PatPos = 1 THEN
SUBSTRING([Code], 2,ISNULL(NULLIF(ca.SpacePos,0)-2, 8000)) ELSE NULL END),
CASE WHEN ca.PatPos = 1 THEN LEFT([Code],
ISNULL(NULLIF(ca.SpacePos,0)-0,1)) ELSE [Code] END) ca2(OrderBy2, OrderBy1)
WHERE [TypeID] = '1'
OUTPUT:
FFS1
FFS2
...
FFS12
FFS1.1
FFS1.2
...
FFS1.1E
FFS1.1R
...
FFS12.1
FFS12.2
FFS.12.1E
FFS12.1R
FFS12.2E
FFS12.2R
DESIRED OUTPUT:
FFS1
FFS1.1
FFS1.1E
FFS1.1R
....
FFS12
FFS12.1
FFS12.1E
FFS12.1R
What am I missing or overlooking?
EDIT:
Let me try to detail the table contents a little better. There are records for FFS1 - FFS12. Those are broken into X subs, i.e., FFS1.1 - FFS1.X to FFS12.1 - FFS12.X. The E and the R was not a typo, each sub record has two codes associated with it: FFS1.1E & FFS1.1R.
Additionally I tried using ORDER BY but it sorted as
FFS1
...
FFS10
FFS2
This will work for any count of parts separated by dots. The sorting is alphanumerical for each part separately.
DECLARE #YourValues TABLE(ID INT IDENTITY, SomeVal VARCHAR(100));
INSERT INTO #YourValues VALUES
('FFS1')
,('FFS2')
,('FFS12')
,('FFS1.1')
,('FFS1.2')
,('FFS1.1E')
,('FFS1.1R')
,('FFS12.1')
,('FFS12.2')
,('FFS.12.1E')
,('FFS12.1R')
,('FFS12.2E')
,('FFS12.2R');
--The query
WITH Splittable AS
(
SELECT ID
,SomeVal
,CAST(N'<x>' + REPLACE(SomeVal,'.','</x><x>') + N'</x>' AS XML) AS Casted
FROM #YourValues
)
,Parted AS
(
SELECT Splittable.*
,ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS PartNmbr
,A.part.value(N'text()[1]','nvarchar(max)') AS Part
FROM Splittable
CROSS APPLY Splittable.Casted.nodes(N'/x') AS A(part)
)
,AddSortCrit AS
(
SELECT ID
,SomeVal
,(SELECT LEFT(x.Part + REPLICATE(' ',10),10) AS [*]
FROM Parted AS x
WHERE x.ID=Parted.ID
ORDER BY PartNmbr
FOR XML PATH('')
) AS SortColumn
FROM Parted
GROUP BY ID,SomeVal
)
SELECT ID
,SomeVal
FROM AddSortCrit
ORDER BY SortColumn;
The result
ID SomeVal
10 FFS.12.1E
1 FFS1
4 FFS1.1
6 FFS1.1E
7 FFS1.1R
5 FFS1.2
3 FFS12
8 FFS12.1
11 FFS12.1R
9 FFS12.2
12 FFS12.2E
13 FFS12.2R
2 FFS2
Some explanation:
The first CTE will transform your codes to XML, which allows to address each part separately.
The second CTE returns each part toegther with a number.
The third CTE re-concatenates your code, but each part is padded to a length of 10 characters.
The final SELECT uses this new single-string-per-row in the ORDER BY.
Final hint:
This design is bad! You should not store these values in concatenated strings... Store them in separate columns and fiddle them together just for the output/presentation layer. Doing so avoids this rather ugly fiddle...
I would like to use the IN clause, but with the convert function.
Basically, I have a table (A) with the column of type int.
But in the other table (B) I Have values which are of type varchar.
Essentially, what I am looking for something like this
select *
from B
where myB_Column IN (select myA_Columng from A)
However, I am not sure if the int from table A, would map / convert / evaluate properly for the varchar in B.
I am using SQL Server 2008.
You can use CASE statement in where clause like this and CAST only if its Integer.
else 0 or NULL depending on your requirements.
SELECT *
FROM B
WHERE CASE ISNUMERIC(myB_Column) WHEN 1 THEN CAST(myB_Column AS INT) ELSE 0 END
IN (SELECT myA_Columng FROM A)
ISNUMERIC will be 1 (true) for Decimal values as-well so ideally you should implement your own IsInteger UDF .To do that look at this question
T-sql - determine if value is integer
Option #1
Select * from B where myB_Column IN
(
Select Cast(myA_Columng As Int) from A Where ISNUMERIC(myA_Columng) = 1
)
Option #2
Select B.* from B
Inner Join
(
Select Cast(myA_Columng As Int) As myA_Columng from A
Where ISNUMERIC(myA_Columng) = 1
) T
On T.myA_Columng = B.myB_Column
Option #3
Select B.* from B
Left Join
(
Select Cast(myA_Columng As Int) As myA_Columng from A
Where ISNUMERIC(myA_Columng) = 1
) T
On T.myA_Columng = B.myB_Column
I will opt third one. Reason is below mentioned.
Disadvantages of IN Predicate
Suppose I have two list objects.
List 1 List 2
1 12
2 7
3 8
4 98
5 9
6 10
7 6
Using Contains, it will search for each List-1 item in List-2 that means iteration will happen 49 times !!!
You can also use exists caluse,
select *
from B
where EXISTS (select 1 from A WHERE CAST(myA_Column AS VARCHAR) = myB_Column)
You can use below query :
select B.*
from B
inner join (Select distinct MyA_Columng from A) AS X ON B.MyB_Column = CAST(x.MyA_Columng as NVARCHAR(50))
Try it by using CAST()
SELECT *
FROM B
WHERE CAST(myB_Column AS INT(11)) IN (
SELECT myA_Columng
FROM A
)
I have a coworker looking for this, and I don't recall ever running into anything like that.
Is there a reasonable technique that would let you simulate it?
SELECT PRODUCT(X)
FROM
(
SELECT 3 X FROM DUAL
UNION ALL
SELECT 5 X FROM DUAL
UNION ALL
SELECT 2 X FROM DUAL
)
would yield 30
select exp(sum(ln(col)))
from table;
edit:
if col always > 0
DECLARE #a int
SET #a = 1
-- re-assign #a for each row in the result
-- as what #a was before * the value in the row
SELECT #a = #a * amount
FROM theTable
There's a way to do string concat that is similiar:
DECLARE #b varchar(max)
SET #b = ""
SELECT #b = #b + CustomerName
FROM Customers
Here's another way to do it. This is definitely the longer way to do it but it was part of a fun project.
You've got to reach back to school for this one, lol. They key to remember here is that LOG is the inverse of Exponent.
LOG10(X*Y) = LOG10(X) + LOG10(Y)
or
ln(X*Y) = ln(X) + ln(Y) (ln = natural log, or simply Log base 10)
Example
If X=5 and Y=6
X * Y = 30
ln(5) + ln(6) = 3.4
ln(30) = 3.4
e^3.4 = 30, so does 5 x 6
EXP(3.4) = 30
So above, if 5 and 6 each occupied a row in the table, we take the natural log of each value, sum up the rows, then take the exponent of the sum to get 30.
Below is the code in a SQL statement for SQL Server. Some editing is likely required to make it run on Oracle. Hopefully it's not a big difference but I suspect at least the CASE statement isn't the same on Oracle. You'll notice some extra stuff in there to test if the sign of the row is negative.
CREATE TABLE DUAL (VAL INT NOT NULL)
INSERT DUAL VALUES (3)
INSERT DUAL VALUES (5)
INSERT DUAL VALUES (2)
SELECT
CASE SUM(CASE WHEN SIGN(VAL) = -1 THEN 1 ELSE 0 END) % 2
WHEN 1 THEN -1
ELSE 1
END
* CASE
WHEN SUM(VAL) = 0 THEN 0
WHEN SUM(VAL) IS NOT NULL THEN EXP(SUM(LOG(ABS(CASE WHEN SIGN(VAL) <> 0 THEN VAL END))))
ELSE NULL
END
* CASE MIN(ABS(VAL)) WHEN 0 THEN 0 ELSE 1 END
AS PRODUCT
FROM DUAL
The accepted answer by tuinstoel is correct, of course:
select exp(sum(ln(col)))
from table;
But notice that if col is of type NUMBER, you will find tremendous performance improvement when using BINARY_DOUBLE instead. Ideally, you would have a BINARY_DOUBLE column in your table, but if that's not possible, you can still cast col to BINARY_DOUBLE. I got a 100x improvement in a simple test that I documented here, for this cast:
select exp(sum(ln(cast(col as binary_double))))
from table;
Is there a reasonable technique that would let you simulate it?
One technique could be using LISTAGG to generate product_expression string and XMLTABLE + GETXMLTYPE to evaluate it:
WITH cte AS (
SELECT grp, LISTAGG(l, '*') AS product_expression
FROM t
GROUP BY grp
)
SELECT c.*, s.val AS product_value
FROM cte c
CROSS APPLY(
SELECT *
FROM XMLTABLE('/ROWSET/ROW/*'
PASSING dbms_xmlgen.getXMLType('SELECT ' || c.product_expression || ' FROM dual')
COLUMNS val NUMBER PATH '.')
) s;
db<>fiddle demo
Output:
+------+---------------------+---------------+
| GRP | PRODUCT_EXPRESSION | PRODUCT_VALUE |
+------+---------------------+---------------+
| b | 2*6 | 12 |
| a | 3*5*7 | 105 |
+------+---------------------+---------------+
More roboust version with handling single NULL value in the group:
WITH cte AS (
SELECT grp, LISTAGG(l, '*') AS product_expression
FROM t
GROUP BY grp
)
SELECT c.*, s.val AS product_value
FROM cte c
OUTER APPLY(
SELECT *
FROM XMLTABLE('/ROWSET/ROW/*'
passing dbms_xmlgen.getXMLType('SELECT ' || c.product_expression || ' FROM dual')
COLUMNS val NUMBER PATH '.')
WHERE c.product_expression IS NOT NULL
) s;
db<>fiddle demo
*CROSS/OUTER APPLY(Oracle 12c) is used for convenience and could be replaced with nested subqueries.
This approach could be used for generating different aggregation functions.
There are many different implmentations of "SQL". When you say "does sql have" are you referring to a specific ANSI version of SQL, or a vendor specific implementation. DavidB's answer is one that works in a few different environments I have tested but depending on your environment you could write or find a function exactly like what you are asking for. Say you were using Microsoft SQL Server 2005, then a possible solution would be to write a custom aggregator in .net code named PRODUCT which would allow your original query to work exactly as you have written it.
In c# you might have to do:
SELECT EXP(SUM(LOG([col])))
FROM table;