Dynamic pivot for thousands of columns - sql

I'm using pgAdmin III / PostgreSQL 9.4 to store and work with my data. Sample of my current data:
x | y
--+--
0 | 1
1 | 1
2 | 1
5 | 2
5 | 2
2 | 2
4 | 3
6 | 3
2 | 3
How I'd like it to be formatted:
1, 2, 3 -- column names are unique y values
0, 5, 4 -- the first respective x values
1, 5, 6 -- the second respective x values
2, 2, 2 -- etc.
It would need to be dynamic because I have millions of rows and thousands of unique values for y.
Is using a dynamic pivot approach correct for this? I have not been able to successfully implement this:
DECLARE #columns VARCHAR(8000)
SELECT #columns = COALESCE(#columns + ',[' + cast(y as varchar) + ']',
'[' + cast(y as varchar)+ ']')
FROM tableName
GROUP BY y
DECLARE #query VARCHAR(8000)
SET #query = '
SELECT x
FROM tableName
PIVOT
(
MAX(x)
FOR [y]
IN (' + #columns + ')
)
AS p'
EXECUTE(#query)
It is stopping on the first line and giving the error:
syntax error at or near "#"
All dynamic pivot examples I've seen use this, so I'm not sure what I've done wrong. Any help is appreciated. Thank you for your time.
**Note: It is important for the x values to be stored in the correct order, as sequence matters. I can add another column to indicate sequential order if necessary.

The term "first row" assumes a natural order of rows, which does not exist in database tables. So, yes, you need to add another column to indicate sequential order like you suspected. I am assuming a column tbl_id for the purpose. Using the ctid would be a measure of last resort. See:
Deterministic sort order for window functions
The code you present looks like MS SQL Server code; invalid syntax for Postgres.
For millions of rows and thousands of unique values for Y it wouldn't even make sense to try and return individual columns. Postgres has generous limits, but not nearly generous enough for that. According to the source code or the manual, the absolute maximum number of columns is 1600.
So we don't even get to discuss the restrictive characteristics of SQL, which demands to know columns and data types at execution time, not dynamically adjusted during execution. You would need two separate calls, like we discussed in great detail under this related question.
Dynamic alternative to pivot with CASE and GROUP BY
Another answer by Clodoaldo under the same question returns arrays. That can actually be completely dynamic. And that's what I suggest here, too. The query is actually rather simple:
WITH cte AS (
SELECT *, row_number() OVER (PARTITION BY y ORDER BY tbl_id) AS rn
FROM tbl
ORDER BY y, tbl_id
)
SELECT text 'y' AS col, array_agg (y) AS values
FROM cte
WHERE rn = 1
UNION ALL
( -- parentheses required
SELECT text 'x' || rn, array_agg (x)
FROM cte
GROUP BY rn
ORDER BY rn
);
Result:
col | values
----+--------
y | {1,2,3}
x1 | {0,5,4}
x2 | {1,5,6}
x3 | {2,2,2}
db<>fiddle here
Old sqlfiddle
Explanation
The CTE computes a row_number rn for each row (each x) per group of y. We are going to use it twice, hence the CTE.
The 1st SELECT in the outer query generates the array of y values.
The 2nd SELECT in the outer query generates all arrays of x values in order. Arrays can have different length.
Why the parentheses for UNION ALL? See:
Sum results of a few queries and then find top 5 in SQL

Related

WHILE Window Operation with Different Starting Point Values From Column - SQL Server [duplicate]

In SQL there are aggregation operators, like AVG, SUM, COUNT. Why doesn't it have an operator for multiplication? "MUL" or something.
I was wondering, does it exist for Oracle, MSSQL, MySQL ? If not is there a workaround that would give this behaviour?
By MUL do you mean progressive multiplication of values?
Even with 100 rows of some small size (say 10s), your MUL(column) is going to overflow any data type! With such a high probability of mis/ab-use, and very limited scope for use, it does not need to be a SQL Standard. As others have shown there are mathematical ways of working it out, just as there are many many ways to do tricky calculations in SQL just using standard (and common-use) methods.
Sample data:
Column
1
2
4
8
COUNT : 4 items (1 for each non-null)
SUM : 1 + 2 + 4 + 8 = 15
AVG : 3.75 (SUM/COUNT)
MUL : 1 x 2 x 4 x 8 ? ( =64 )
For completeness, the Oracle, MSSQL, MySQL core implementations *
Oracle : EXP(SUM(LN(column))) or POWER(N,SUM(LOG(column, N)))
MSSQL : EXP(SUM(LOG(column))) or POWER(N,SUM(LOG(column)/LOG(N)))
MySQL : EXP(SUM(LOG(column))) or POW(N,SUM(LOG(N,column)))
Care when using EXP/LOG in SQL Server, watch the return type http://msdn.microsoft.com/en-us/library/ms187592.aspx
The POWER form allows for larger numbers (using bases larger than Euler's number), and in cases where the result grows too large to turn it back using POWER, you can return just the logarithmic value and calculate the actual number outside of the SQL query
* LOG(0) and LOG(-ve) are undefined. The below shows only how to handle this in SQL Server. Equivalents can be found for the other SQL flavours, using the same concept
create table MUL(data int)
insert MUL select 1 yourColumn union all
select 2 union all
select 4 union all
select 8 union all
select -2 union all
select 0
select CASE WHEN MIN(abs(data)) = 0 then 0 ELSE
EXP(SUM(Log(abs(nullif(data,0))))) -- the base mathematics
* round(0.5-count(nullif(sign(sign(data)+0.5),1))%2,0) -- pairs up negatives
END
from MUL
Ingredients:
taking the abs() of data, if the min is 0, multiplying by whatever else is futile, the result is 0
When data is 0, NULLIF converts it to null. The abs(), log() both return null, causing it to be precluded from sum()
If data is not 0, abs allows us to multiple a negative number using the LOG method - we will keep track of the negativity elsewhere
Working out the final sign
sign(data) returns 1 for >0, 0 for 0 and -1 for <0.
We add another 0.5 and take the sign() again, so we have now classified 0 and 1 both as 1, and only -1 as -1.
again use NULLIF to remove from COUNT() the 1's, since we only need to count up the negatives.
% 2 against the count() of negative numbers returns either
--> 1 if there is an odd number of negative numbers
--> 0 if there is an even number of negative numbers
more mathematical tricks: we take 1 or 0 off 0.5, so that the above becomes
--> (0.5-1=-0.5=>round to -1) if there is an odd number of negative numbers
--> (0.5-0= 0.5=>round to 1) if there is an even number of negative numbers
we multiple this final 1/-1 against the SUM-PRODUCT value for the real result
No, but you can use Mathematics :)
if yourColumn is always bigger than zero:
select EXP(SUM(LOG(yourColumn))) As ColumnProduct from yourTable
I see an Oracle answer is still missing, so here it is:
SQL> with yourTable as
2 ( select 1 yourColumn from dual union all
3 select 2 from dual union all
4 select 4 from dual union all
5 select 8 from dual
6 )
7 select EXP(SUM(LN(yourColumn))) As ColumnProduct from yourTable
8 /
COLUMNPRODUCT
-------------
64
1 row selected.
Regards,
Rob.
With PostgreSQL, you can create your own aggregate functions, see http://www.postgresql.org/docs/8.2/interactive/sql-createaggregate.html
To create an aggregate function on MySQL, you'll need to build an .so (linux) or .dll (windows) file. An example is shown here: http://www.codeproject.com/KB/database/mygroupconcat.aspx
I'm not sure about mssql and oracle, but i bet they have options to create custom aggregates as well.
You'll break any datatype fairly quickly as numbers mount up.
Using LOG/EXP is tricky because of numbers <= 0 that will fail when using LOG. I wrote a solution in this question that deals with this
Using CTE in MS SQL:
CREATE TABLE Foo(Id int, Val int)
INSERT INTO Foo VALUES(1, 2), (2, 3), (3, 4), (4, 5), (5, 6)
;WITH cte AS
(
SELECT Id, Val AS Multiply, row_number() over (order by Id) as rn
FROM Foo
WHERE Id=1
UNION ALL
SELECT ff.Id, cte.multiply*ff.Val as multiply, ff.rn FROM
(SELECT f.Id, f.Val, (row_number() over (order by f.Id)) as rn
FROM Foo f) ff
INNER JOIN cte
ON ff.rn -1= cte.rn
)
SELECT * FROM cte
Not sure about Oracle or sql-server, but in MySQL you can just use * like you normally would.
mysql> select count(id), count(id)*10 from tablename;
+-----------+--------------+
| count(id) | count(id)*10 |
+-----------+--------------+
| 961 | 9610 |
+-----------+--------------+
1 row in set (0.00 sec)

Get Distinct value from a list in SQL Server

I have a DB column that has a comma delimited list:
VALUES ID
--------------------
1,11,32 A
11,12,28 B
1 C
32,12,1 D
When I run my SQL statement, in my WHERE clause I have tried IN, CONTAINS and LIKE with varying degrees of errors and success, but none offer an exact return of what I need.
What I need is a where clause that if I'm looking for all IDs with vale of '1' (NOT the number) in the list.
Example of problem:
WHERE values like (1)
This will return A,B,C,D because 1 is included in the value (11). I would expect IDs (A,C,D).
WHERE values like (2)
This will return A,B,D because 2 is included in the value (32,28,12). I would expect zeros records.
Thanks in advance for your help!
I will begin my answer by quoting the spot-on comment given by #Jarlh above:
Never, ever store data as comma separated items. It will only cause you lots of trouble.
That being said, if you're really stuck with this design, you could use:
SELECT *
FROM yourTable
WHERE ',' + [VALUES] + ',' LIKE '%,1,%';
The trick here is convert every VALUES into something looking like:
,11,12,28,
Then, we can search for a target number with comma delimiters on both sides. Since we placed commas at both ends, then every number in the CSV list is now guaranteed to have commas around it.
If you are stuck with such a poor data model, I would suggest:
select t.*
from t
where exists (select 1
from string_split(t.values, ',') s
where s.value = 1
);
Exactly i echo what jarlh and Tim says. relational model is not the right place to store comma delimited strings in table.
Here is an approach, that can likely use an index if there is one on column x
select distinct x
from t
cross apply string_split(t.x,',')
where value=1 /*out here you may parameterize, and also could make use of an index each if there is one in value*/
+---------+
| x |
+---------+
| 1 |
| 1,11,32 |
| 32,12,1 |
+---------+
working example
https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=b9b3084f52b0f42ffd17d90427016999
--SQL Server older versions
with data
as (
SELECT t.c.value('.', 'VARCHAR(1000)') as val
,y
,x
FROM (
SELECT x1 = CAST('<t>' +
REPLACE(x , ',', '</t><t>') + '</t>' AS XML)
,y
,x
FROM t
) a
CROSS APPLY x1.nodes('/t') t(c)
)
select x,y
from data
+---------+
| x |
+---------+
| 1 |
| 1,11,32 |
| 32,12,1 |
+---------+
working example
https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=011a096bbdd759ea5fe3aa74b08bc895

SQL Combing the top 2 field values into 1 value

I have a very simple query that returns the Notes field. Since there can be multiple notes, I only want the top 2. No problem. However, I'm going to be using the sql within another query. I really don't want 2 lines in my results. I would like to combine the results into 1 field value so I only have 1 result line in the results. Is this possible?
For example, I currently get the following:
12345 1001 500.00 "Note 1"
12345 1001 500.00 "Note 2"
What I would like to see is this:
12345 1001 500.00 "Note 1 AND Note 2"
Following is the sql:
select top 2 rcai.field_value
from rnt_agrs ra
inner join rnt_agr_inv_notes rain on ra.rnt_agr_nbr=rain.rea_rnt_agr_nbr
inner join RNT_CUST_ADDNL_INFO rcai on rain.rea_rnt_agr_nbr=rcai.rea_rnt_agr_nbr and rain.bac_acc_id=rcai.bac_acct_id
where ra.rnt_agr_nbr=128260511
Thanks for your help. I appreciate this forum for help with these issues.....
Get the next row's value and filter all but the first row:
select ..., rcai.field_value || ' AND '
min(rcai.field_value) -- next row's value (same as LEAD in Standard SQL)
over (partition by ra.rnt_agr_nbr
order by rcai.field_value
rows between 1 following and 1 following) as next_field_value
from rnt_agrs ra
inner join rnt_agr_inv_notes rain on ra.rnt_agr_nbr=rain.rea_rnt_agr_nbr
inner join RNT_CUST_ADDNL_INFO rcai on rain.rea_rnt_agr_nbr=rcai.rea_rnt_agr_nbr and rain.bac_acc_id=rcai.bac_acct_id
where ra.rnt_agr_nbr=128260511
qualify
row_number() -- only the first row
over (partition by ra.rnt_agr_nbr
order by rcai.field_value) = 1
If there might be only a single row you need to add a COALESCE(min...,'') to get rid of the NULL.
Both OLAP functions specify the same PARTITION and ORDER, so this is a single working step.
select *,(SELECT top 2 rcai.field_value + ' AND ' AS [text()]
FROM RNT_CUST_ADDNL_INFO rcai
WHERE rcai.rea_rnt_agr_nbr = rain.rea_rnt_agr_nbr
AND rcai.bac_acct_id=rain.bac_acc_id
FOR XML PATH('')) AS Notes
from
rnt_agrs ra inner join rnt_agr_inv_notes rain
on ra.rnt_agr_nbr=rain.rea_rnt_agr_nbr
I had something like this, where there was a 1 to many, and I wanted a semicolon delimited set of values in a single column with the main record.
You could use PIVOT to transform the two note rows into two note columns based on row number, then concatenate them. Here's an example:
SELECT pvt.[1] + ' and ' + pvt.[2]
FROM
( --the selection of your table data, including a row-number column
SELECT Msg, ROW_NUMBER() OVER(ORDER BY Id)
--sample data shown here, but this would be your real table
FROM (VALUES(1, 'Note 1'), (2, 'Note 2'), (3, 'Note 3')) Note(Id, Msg)
) Data (Msg, Row)
PIVOT (MAX(Msg) FOR Row IN ([1], [2])) pvt
Note that MAX is used for the aggregate in the PIVOT since an aggregate is required, but since ROW_NUMBER is unique, you're only aggregating a single value.
This could also be easily extended to the first N rows - just include the row numbers you want in the pivot and combine them as desired in the select statement.

how to select one tuple in rows based on variable field value

I'm quite new into SQL and I'd like to make a SELECT statement to retrieve only the first row of a set base on a column value. I'll try to make it clearer with a table example.
Here is my table data :
chip_id | sample_id
-------------------
1 | 45
1 | 55
1 | 5986
2 | 453
2 | 12
3 | 4567
3 | 9
I'd like to have a SELECT statement that fetch the first line with chip_id=1,2,3
Like this :
chip_id | sample_id
-------------------
1 | 45 or 55 or whatever
2 | 12 or 453 ...
3 | 9 or ...
How can I do this?
Thanks
i'd probably:
set a variable =0
order your table by chip_id
read the table in row by row
if table[row]>variable, store the table[row] in a result array,increment variable
loop till done
return your result array
though depending on your DB,query and versions you'll probably get unpredictable/unreliable returns.
You can get one value using row_number():
select chip_id, sample_id
from (select chip_id, sample_id,
row_number() over (partition by chip_id order by rand()) as seqnum
) t
where seqnum = 1
This returns a random value. In SQL, tables are inherently unordered, so there is no concept of "first". You need an auto incrementing id or creation date or some way of defining "first" to get the "first".
If you have such a column, then replace rand() with the column.
Provided I understood your output, if you are using PostGreSQL 9, you can use this:
SELECT chip_id ,
string_agg(sample_id, ' or ')
FROM your_table
GROUP BY chip_id
You need to group your data with a GROUP BY query.
When you group, generally you want the max, the min, or some other values to represent your group. You can do sums, count, all kind of group operations.
For your example, you don't seem to want a specific group operation, so the query could be as simple as this one :
SELECT chip_id, MAX(sample_id)
FROM table
GROUP BY chip_id
This way you are retrieving the maximum sample_id for each of the chip_id.

how to find continuous series using pl/sql

i am a pl/sql programmer and facing a problem in finding continuity in series for the same date
suppose i am having series like
1000,1001,
1002,1003,
1004,1005,
1016,1017,
1018,1019,
1020,1021,
1035,1036,
1037,1038,
1039,1040
and i am looking for the output as
from_series ------------- to_series
1000 ------------- 1005
1016 ------------- 1021
1035 ------------- 1040
i did trying it with but the problem which i faced is in case
SELECT *
FROM retort_t r
where NOT EXISTS
(
SELECT 'X'
FROM retort_t
r.series_NO-ISSUE_NO=1 );
SELECT *
FROM retort_t r
where NOT EXISTS
(
SELECT 'X'
FROM retort_t
ISSUE_NO=r.series_NO+1 );
I am getting the result by joining the above two queries in alignment. It's ok for few records but my records are in lac's, it's taking a long time to fetch data from joining these two queries.
please let me the appropriate way to sort out the data in correct interval.
Assuming a simple table structure such as:
CREATE TABLE T (x INT);
INSERT INTO T (x) VALUES
(1000), (1001), (1002), (1003),
(1004), (1005), (1016), (1017),
(1018), (1019), (1020), (1021),
(1035), (1036), (1037), (1038),
(1039), (1040);
You can use ROW_NUMBER() to get a static value for sequential numbers, you can then group by this value to get the min and max values in a range:
SELECT MIN(x) AS RangeStart, MAX(x) AS RangeEnd
FROM ( SELECT X,
X - ROW_NUMBER() OVER(ORDER BY x) AS GroupBy
FROM T
) t
GROUP BY GroupBy;
Example On SQL Fiddle