Finding a preset value nearest to the column value - sql

For those who run into this question twice; apologies. I'm trying to obtain the same answer using different code; I also posted this for obtaining M, powerquery or excel solutions.
I need to retrieve the nearest matching value in a fixed set values for an entire column.
The set of values that have to be matched looks as follows
| preceding column | Sys_size |
===============================
| ... | null |
| ... | 7 |
| ... | 9 |
| ... | 12 |
| ... | 15 |
| ... | 17 |
| ... | null |
so in short, the list above is variable (more sizes could be added or changed), and contains null values.
Second there's a bunch of variable numbers as follows
| preceding column | User_size |
================================
| ... | 8.5 |
| ... | 13 |
| ... | 6 |
| ... | 10.5 |
| ... | 18 |
| ... | 14 |
The result I want to obtain in my script looks like this
| preceding column | User_size | Sys_size |
===========================================
| ... | 8.5 | 9 |
| ... | 13 | 12 |
| ... | 6 | 7 |
| ... | 10.5 | 12 |
| ... | 18 | 17 |
| ... | 14 | 15 |
simply put, it searches the nearest Sys_size matching the User_size input. Note that in case the user's value falls exactly between two Sys_size values the result is rounded up.
So far I have achieved the inverted version of this which looks something along the lines of this, but only works with fixed value input:
SELECT DISTINCT Sys_Size
FROM System
WHERE ABS(Sys_Size - 8.5) = (
SELECT MIN(ABS(Sys_Size - 8.5))
FROM System
)
which would return 9
But this has to be systematically applied to the entire user column. I feel I'm ridiculously close and overlooking something very obvious.

In SQL Server, you can use apply:
select u.*, s.sys_size
from users u cross apply
(select top (1) u.*
from system s
where s.sys_size is not null
order by abs(s.sys_size - u.user_size)
) s;

two options the I know of
declare #U table (s decimal(5,2));
declare #S table (s decimal(5,2));
insert into #U values (7), (9), (12), (15), (17);
insert into #S values (8.5), (13), (6), (10.5), (18), (14);
select u.s, ss.s
from #U u
cross apply ( select top 1 s.s
from #S s
order by abs(u.s - s.s)
) ss;
select us, ss, diff
from
( select u.s as us, s.s as ss
, abs(s.s - u.s) as diff
, ROW_NUMBER() over (partition by u.s order by abs(s.s - u.s) asc) as rn
from #U u
cross join #S s
where u.s is not null and s.s is not null
) tt
where tt.rn = 1
order by us, rn

Related

Linking Related IDs together through two other ID columns

I have a table of about 100k rows with the following layout:
+----+-----------+------------+-------------------+
| ID | PIN | RAID | Desired Output ID |
+----+-----------+------------+-------------------+
| 1 | 80602627 | 1737852-1 | 1 |
| 2 | 80602627 | 34046655-1 | 1 |
| 3 | 351418172 | 33661 | 2 |
| 4 | 351418172 | 33661 | 2 |
| 5 | 351418172 | 33661 | 2 |
| 6 | 351418172 | 34443321-1 | 2 |
| 7 | 491863017 | 26136 | 3 |
| 8 | 491863017 | 34575 | 3 |
| 9 | 491863017 | 34575 | 3 |
| 10 | 661254727 | 26136 | 3 |
| 11 | 661254727 | 26136 | 3 |
| 12 | NULL | 7517 | 4 |
| 13 | NULL | 7517 | 4 |
| 14 | NULL | 7517 | 4 |
| 15 | NULL | 7517 | 4 |
| 16 | NULL | 7517 | 4 |
| 17 | 554843813 | 33661 | 2 |
| 18 | 554843813 | 33661 | 2 |
+----+-----------+------------+-------------------+
The ID column has unique values, with the PIN and RAID columns being two separate identifying numbers used to group linked IDs together. The Desired Output ID column is what I would like SQL to do, essentially looking at both the PIN and RAID columns to spot where there are any relationships between them.
So for example Where Desired Output ID = 2, IDs 3-6 match on PIN = 351418172, and then IDs 17-18 also match as the RAID of 33661 was in the rows for IDs 3-5.
To add as well, NULLs will be in the PIN Column but not in any others.
I did spot a similar question Text however as it is in BigQuery I wasnt sure it would help.
Have been trying to crack this one for a while with no luck, any help massively appreciated.
I suppose DENSE_RANK can solve your problem. Not sure what the combination of PIN and RAID should be, but I think you'll be able to figure it out how to do it like this:
SELECT *,DENSE_RANK( ) over (ORDER BY isnull(pin,id) ),DENSE_RANK( ) over (ORDER BY raid)
FROM accounts
I believe I have found a bit of a bodged solution to this. It runs very slowly as it goes row by row and will only go two links deep on PIN/RAID, but this should be sufficient for 99%+ cases.
Would appreciate any suggestions to speeding it up if anything is immediately obvious.
ID in post above is DebtorNo in Code:
DECLARE #Counter INT = 1
DECLARE #EndCounter INT = 0
IF OBJECT_ID('Tempdb..#OrigACs') IS NOT NULL
BEGIN
DROP TABLE #OrigACs
END
SELECT DebtorNo,
Name,
PostCode,
DOB,
RAJoin,
COALESCE(PIN,DebtorNo COLLATE DATABASE_DEFAULT) AS PIN,
RelatedAssets,
RAID,
PINRelatedAssets
INTO #OrigACs
FROM MIReporting..HC_RA_Test_Data RA
IF OBJECT_ID('Tempdb..#Accounts') IS NOT NULL
BEGIN
DROP TABLE #Accounts
END
SELECT *,
ROW_NUMBER() OVER (ORDER BY CAST(RA.DebtorNo AS INT)) AS Row
INTO #Accounts
FROM #OrigACs RA
ORDER BY CAST(RA.DebtorNo AS INT)
CREATE INDEX Temp_HC_Index ON #OrigACs (RAID,PIN)
SET #EndCounter = (SELECT MAX(Row) FROM #Accounts)
WHILE #Counter <= #EndCounter
BEGIN
IF OBJECT_ID('Tempdb..#RAID1') IS NOT NULL
BEGIN
DROP TABLE #RAID1
END
SELECT *
INTO #RAID1
FROM #OrigACs A
WHERE A.RAID IN (SELECT RAID FROM #Accounts WHERE [Row] = #Counter)
IF OBJECT_ID('Tempdb..#PIN1') IS NOT NULL
BEGIN
DROP TABLE #PIN1
END
SELECT *
INTO #PIN1
FROM #OrigACs A
WHERE A.PIN IN (SELECT PIN FROM #RAID1)
IF OBJECT_ID('Tempdb..#RAID2') IS NOT NULL
BEGIN
DROP TABLE #RAID2
END
SELECT *
INTO #RAID2
FROM #OrigACs A
WHERE A.RAID IN (SELECT RAID FROM #PIN1)
IF OBJECT_ID('Tempdb..#PIN2') IS NOT NULL
BEGIN
DROP TABLE #PIN2
END
SELECT *
INTO #PIN2
FROM #OrigACs A
WHERE A.PIN IN (SELECT PIN FROM #RAID2)
INSERT INTO MIReporting..HC_RA_Final_ACs
SELECT DebtorNo,
Name,
PostCode,
DOB,
RAJoin,
CASE
WHEN PIN = DebtorNo COLLATE DATABASE_DEFAULT THEN NULL
ELSE PIN
END AS PIN,
RelatedAssets,
RAID,
PINRelatedAssets,
COALESCE((SELECT MAX(FRAID) FROM MIReporting..HC_RA_Final_ACs),0) + 1 AS FRAID
FROM #PIN2
SET #Counter = (SELECT MIN([ROW]) FROM #Accounts O WHERE O.DebtorNo NOT IN (SELECT DebtorNo FROM MIReporting..HC_RA_Final_ACs));
END;
SELECT *
FROM MIReporting..HC_RA_Final_ACs
DROP TABLE #OrigACs
DROP TABLE #Accounts
DROP TABLE #RAID1
DROP TABLE #PIN1
DROP TABLE #RAID2
DROP TABLE #PIN2

Calculating consecutive range of dates with a value in Hive

I want to know if it is possible to calculate the consecutive ranges of a specific value for a group of Id's and return the calculated value(s) of each one.
Given the following data:
+----+----------+--------+
| ID | DATE_KEY | CREDIT |
+----+----------+--------+
| 1 | 8091 | 0.9 |
| 1 | 8092 | 20 |
| 1 | 8095 | 0.22 |
| 1 | 8096 | 0.23 |
| 1 | 8098 | 0.23 |
| 2 | 8095 | 12 |
| 2 | 8096 | 18 |
| 2 | 8097 | 3 |
| 2 | 8098 | 0.25 |
+----+----------+--------+
I want the following output:
+----+-------------------------------+
| ID | RANGE_DAYS_CREDIT_LESS_THAN_1 |
+----+-------------------------------+
| 1 | 1 |
| 1 | 2 |
| 1 | 1 |
| 2 | 1 |
+----+-------------------------------+
In this case, the ranges are the consecutive days with credit less than 1. If there is a gap between date_key column, then the range won't have to take the next value, like in ID 1 between 8096 and 8098 date key.
Is it possible to do this with windowing functions in Hive?
Thanks in advance!
You can do this with a running sum classifying rows into groups, incrementing by 1 every time a credit<1 row is found(in the date_key order). Thereafter it is just a group by.
select id,count(*) as range_days_credit_lt_1
from (select t.*
,sum(case when credit<1 then 0 else 1 end) over(partition by id order by date_key) as grp
from tbl t
) t
where credit<1
group by id
The key is to collapse all the consecutive sequence and compute their length, I struggled to achieve this in a relatively clumsy way:
with t_test as
(
select num,row_number()over(order by num) as rn
from
(
select explode(array(1,3,4,5,6,9,10,15)) as num
)
)
select length(sign)+1 from
(
select explode(continue_sign) as sign
from
(
select split(concat_ws('',collect_list(if(d>1,'v',d))), 'v') as continue_sign
from
(
select t0.num-t1.num as d from t_test t0
join t_test t1 on t0.rn=t1.rn+1
)
)
)
Get the previous number b in the seq for each original a;
Check if a-b == 1, which shows if there is a "gap", marked as 'v';
Merge all a-b to a string, and then split using 'v', and compute length.
To get the ID column out, another string which encode id should be considered.

How to pivot sql data, and squash results into non-null rows by Date per ID

Not a good title to the post, but hopefully it'll catch some eyes.
I have a very complex situation in T-SQL that I am unable to accomplish. I'm hoping someone with expertise knows an elegant and fast solution so that my performance is not impacted. I'm dealing with billions of rows.
PREFACE
I have a table called Customers with a unique ID. Those customers have Files, Files have Properties, and each Property Name corresponds to a single Value.
Tables:
Customers
Files -
Property - contains both Name and Value
The Customer ID is present in all of these tables, as are audit fields such as UpdatedDtm and CreationDtm.
USE CASE
I need to join all customers to their files (filtering for a few) and then tie every file to their properties (again filtering these). This is easy but results in lots of rows, one for each customer x file x property.
I know that the property names will never changes, and I want to return just a select few, so I used a pivot and resulted in a nice table, but it fell apart after I started doing more complex queries.
THE PROBLEM
First, the properties have a DateTime for when they were altered (UpdatedDtm), and I need to return everything altered from 1 hour of the creation date (CreationDtm) in the File table.
This results in me trimming down my list of potential properties, but now I have a table with an RowNumber() per ID and no good way to pivot and select the first one that isn't null and still preserve the number of columns for the table defnition. This is important because I'm using Dynamic SQL and placing it in an indexed temp table with a Composite Key on CustomerID and FileName.
BEFORE PIVOT
| UpdatedDtm | CustomerID | FileName | Property | Value |
| ---------- | ---------- | ---------- | -------- | -------------- |
| 1/1/2015 | 1 | FileOne | Size | NULL |
| 1/1/2015 | 1 | FileOne | Format | JPG |
| 1/7/2015 | 1 | FileOne | Size | 88KB |
| 1/7/2015 | 1 | FileOne | Format | JPG |
| 1/7/2015 | 1 | FileOne | Comment | NULL |
| 1/11/2015 | 1 | FileOne | Comment | NULL |
| 1/1/2015 | 1 | FileTwo | Size | 91KB |
| 1/1/2015 | 1 | FileTwo | Format | PNG |
| 1/11/2015 | 1 | FileTwo | Comment | NULL |
| 1/2/2015 | 2 | FileThree | Size | 74KB |
| 1/2/2015 | 2 | FileThree | Format | XLS |
| 1/2/2015 | 2 | FileThree | State | Open |
| 1/7/2015 | 2 | FileThree | State | Closed |
| 1/10/2015 | 2 | FileThree | Comment | NULL |
| 1/1/2015 | 3 | FileFour | Size | 2KB |
| 1/2/2015 | 3 | FileFour | Size | 10KB |
| 1/3/2015 | 3 | FileFour | Size | 13KB |
| 1/4/2015 | 3 | FileFour | Size | 21KB |
| 1/5/2015 | 3 | FileFour | Size | 27KB |
| 1/6/2015 | 3 | FileFour | Size | 32KB |
| 1/7/2015 | 3 | FileFour | Size | 39KB |
| 1/8/2015 | 3 | FileFour | Size | 44KB |
| 1/1/2015 | 3 | FileFour | Format | TXT |
| 1/1/2015 | 3 | FileFour | Comment | NULL |
Please don't ask me why the database is setup this way or to change the schema. That is set in stone and out of my control. I need to be able to solve the use case as described.
AFTER PIVOT (Expectation)
| CustomerID | FileName | Size | Format | State | Comment |
| ---------- | ---------- | ---- | ------ | ------ | ------- |
| 1 | FileOne | 88KB | JPG | NULL | NULL |
| 1 | FileTwo | 91KB | PNG | NULL | NULL |
| 2 | FileThree | 74KB | XLS | Closed | NULL |
| 3 | FileFour | 44KB | TXT | NULL | NULL |
I have included some NULL values and missing values to showcase that I need to preserve the same columnar properties regardless of them having data, but I also need to squash the data by the the first non-null value within my date range.
CODE (My attempt)
IF Object_id('tempdb..#FilesQuery') IS NOT NULL DROP TABLE #FilesQuery;
CREATE TABLE #FilesQuery (
SeqNum int,
CustomerID numeric(16,0),
FileName varchar(64),
PropertyName varchar(64),
PropertyValue varchar(64)
)
INSERT INTO #FilesQuery
SELECT
CASE WHEN P.[Value] IS NOT NULL
THEN ROW_NUMBER() OVER (partition by C.CustomerID order by UpdatedDtm)
ELSE 0
END as SeqNum,
C.CustomerID
,F.Name as FileName
,P.Name as PropertyName
,P.Value as PropertyValue
FROM Customers C
INNER JOIN Files F ON F.CustomerID = C.CustomerID
LEFT JOIN Properties P
ON P.CustomerID = C.CustomerID
AND P.FileID = F.FileID
WHERE F.FileName IN ('FileOne','FileTwo','FileThree','FileFour')
AND P.Name IN ('Size','Format','State','Comment')
--PIVOT
DECLARE #cols AS nvarchar(MAX)
SELECT #cols = STUFF(
(SELECT DISTINCT ',' + QUOTENAME(PropertyName)
FROM #FilesQuery fq
FOR XML PATH(''),TYPE).value('.','NVARCHAR(MAX)'),1,1,'')
DECLARE #dynSql AS nvarchar(MAX)
SET #dynSql = '
SELECT DISTINCT *
FROM (
SELECT
fq.CustomerID,
fq.FileName,
fq.PropertyName,
fq.PropertyValue
FROM #FilesQuery fq
) SRC
PIVOT (
Max([PropertyValue])
FOR PropertyName IN (' + #cols + ')
) PVT
'
IF Object_id('tempdb..#Results') IS NOT NULL DROP TABLE #Results;
CREATE TABLE #Results (
CustomerID varchar(16) NOT NULL,
FileName varchar(64) NOT NULL,
FileSize varchar(64) NULL,
FileFormat varchar(64) NULL,
FileState varchar(64) NULL,
FileComment varchar(64) NULL,
CONSTRAINT pk_CustDoc PRIMARY KEY (CustomerID,FileName)
)
INSERT INTO #Results EXEC #dynSql;
I'm sorry this code isn't complete, it is the working section I have. The other tries I made resulted in bad data pulls.
I tried using SeqNum and a combination of case statements to try and select the first non-null value for each row so that the data was all on one line, but it ended up being more like.
FileOne NULL NULL Open NULL
FileOne NULL JPG NULL NULL
and so on...
I've been struggling on solving this special case for awhile and am about to scrap and it do something procedural with looping, but that would kill my query time and performance.
Anyone have a good solution? Am I over-thinking things?
you should filter your data before you PIVOT and you will get your desired results. Here is a cte version to show you the steps of how to get what you want.
;WITH cteDefineRowPrecedence AS (
SELECT *
,ROW_NUMBER() OVER (PARTITION BY CustomerId, FileName, Property ORDER BY
CASE WHEN Value IS NOT NULL THEN 0 ELSE 1 END
,UpdatedDtm DESC) as RowNum
FROM
#Table
)
, cteDesiredRwows AS (
SELECT
CustomerId
,FileName
,Property
,Value
FROM
cteDefineRowPrecedence t
WHERE
t.RowNum = 1
AND t.Value IS NOT NULL
)
SELECT *
FROM
cteDesiredRwows t
PIVOT (
MAX(Value)
FOR Property IN (Size,[Format],[State],Comment)
) p
ORDER BY
CustomerId
,FileName
And here is a nested query version that will make it easier to embed/put in your dynamic sql....
SELECT *
FROM
(
SELECT CustomerId, FileName, Property, Value
FROM
(SELECT *
,ROW_NUMBER() OVER (PARTITION BY CustomerId, FileName, Property ORDER BY
CASE WHEN Value IS NOT NULL THEN 0 ELSE 1 END
,UpdatedDtm DESC) as RowNum
FROM
#Table) r
WHERE
r.RowNum = 1
AND r.Value IS NOT NULL
) t
PIVOT (
MAX(Value)
FOR Property IN (Size,[Format],[State],Comment)
) p
ORDER BY
CustomerId
,FileName
You might need to add a WHERE condition inside the CTE definition to restrict the date/time range to what you want.
WITH CTE AS (
SELECT DISTINCT
CustomerID
, FileName
, Property
, Value
FROM
<table_name>
)
SELECT *
FROM
CTE
PIVOT (MAX(value) FOR Property IN( 'Size', 'Format', 'State', 'Comment')) p

microsoft sql server - calculate return between every row and the last row

I have a table like the following:
+-------+--------------+
| Value | Date |
+-------+--------------+
| 14 | 10/11/2010 |
| 12 | 10/12/2010 |
| 12 | 10/13/2010 |
| 10 | 10/14/2010 |
| 8 | 10/15/2010 |
| 6 | 10/16/2010 |
| 4 | 10/17/2010 |
| 2 | 10/18/2010 |
+-------+--------------+
I would like to calculate the return (the quotient) between every row and the last row (which is with the latest date). e.g for the row with date "10/16/2010", the result should be 6/2=3
Hence, the resulting table should be
+-------+--------------+
| result| Date |
+-------+--------------+
| 7 | 10/11/2010 |
| 6 | 10/12/2010 |
| 6 | 10/13/2010 |
| 5 | 10/14/2010 |
| 4 | 10/15/2010 |
| 3 | 10/16/2010 |
| 2 | 10/17/2010 |
| 1 | 10/18/2010 |
+-------+--------------+
Is it possible to complete this? thanks you!
You can get the value you want to divide by. Since that's always going to be a single row, you can just use a cross join to join to that and perform your division. SQL Fiddle
with maxdate as
(select max([Date]) as maxdate from table1),
divby as
(select
value as divby
from
table1
inner join maxdate md
on md.maxdate = table1.[date])
select
value / divby
,[date]
from
table1
cross join divby
To break it down a bit, the first CTE (cleverly named maxdate) gets the maximum date for the whole thing. The second CTE (divby) get the value (that you will be dividing by) for that max date. As long as you only get one row back from that, you can safely use a cross join, resulting in each row in your table being divided by that one value.
Another possible solution JOIN the the table to itself.
SQL Fiddle Example
select (t1b.value / t1a.value) as result,
t1b.date from table1 t1a
join table1 t1b on t1a.date = (select max(date) from table1)
Thanks for the fiddle, Andrew! Can be accomplished like this as well if 2008 and above (fiddle: http://sqlfiddle.com/#!3/ecda1/11):
SELECT [Value] / MIN([Value]) OVER () AS result,
[Date]
FROM Table1

Add and order row on SQL result

I'm looking for a way to always get a define number of row in my SQL result (Oracle).
Let me show you what i meen.
Here's an extract of my table
|----------|----------|----------|
| DATE | NUMBER | TYPE |
|----------|----------|----------|
| 12/01/13 | 2 | A |
| 12/01/13 | 4 | B |
| 12/02/13 | 3 | D |
| 12/02/13 | 1 | A |
| 12/02/13 | 5 | X |
|----------|----------|----------|
I need to always get 5 rows in my result for a chosen date, so complete it with new rows and "null" value.
Here's what i hope the result will look like for 12/01/13
|----------|----------|----------|
| DATE | NUMBER | TYPE |
|----------|----------|----------|
| 12/01/13 | 1 | null |
| 12/01/13 | 2 | A |
| 12/01/13 | 3 | null |
| 12/01/13 | 4 | B |
| 12/01/13 | 5 | null |
|----------|----------|----------|
The idea is the same for the other date. i kind a did something with a lot of UNION but it wasn't working very well.
So how would you write this SELECT query ?
Thanks
A partition outer join will fill in the gaps.
with cte_number_list as (
select rownum my_number
from dual
connect by level <= 5)
select t.my_date,
l.my_number,
t.type
from cte_number_list l left outer join
my_table t partition by (t.my_date)
on l.my_number = t.my_number;
http://docs.oracle.com/cd/E11882_01/server.112/e25554/analysis.htm#DWHSG02013
Off the top of my head, something like this should work:
CREATE TABLE #Numbers ( Number INT );
INSERT #Numbers (Number) VALUES (1), (2), (3), (4), (5);
SELECT Dates.Date, #Numbers.Number, Type
FROM #Numbers, (SELECT DISTINCT Date FROM MyTable) AS Dates
LEFT JOIN MyTable ON MyTable.Date = Dates AND MyTable.Number = #Numbers.Number
Edit: This example probably won't work with Oracle. I just realised that the question was regarding Oracle SQL after I wrote it. Leaving the answer here anyway. If someone knows how to translate it into Oracle SQL syntax, please feel free to edit my answer.