Group and split records in postgres into several new column series - sql

I have data of the form
-----------------------------|
6031566779420 | 25 | 163698 |
6031566779420 | 50 | 98862 |
6031566779420 | 75 | 70326 |
6031566779420 | 95 | 51156 |
6031566779420 | 100 | 43788 |
6036994077620 | 25 | 41002 |
6036994077620 | 50 | 21666 |
6036994077620 | 75 | 14604 |
6036994077620 | 95 | 11184 |
6036994077620 | 100 | 10506 |
------------------------------
and would like to create a dynamic number of new columns by treating each series of (25, 50, 75, 95, 100) and corresponding values as a new series. What I'm looking for as target output is,
--------------------------
| 25 | 163698 | 41002 |
| 50 | 98862 | 21666 |
| 75 | 70326 | 14604 |
| 95 | 51156 | 11184 |
| 100 | 43788 | 10506 |
--------------------------
I'm not sure what the name of the sql / postgres operation I want is called nor how to achieve it. In this case the data has 2 new columns but I'm trying to formulate a solution that has has many new columns as are groups of data in the output of the original query.
[Edit]
Thanks for the references to array_agg, that looks like it would be helpful! I should've mentioned this earlier but I'm using Redshift which reports this version of Postgres:
PostgreSQL 8.0.2 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.4.2 20041017 (Red Hat 3.4.2-6.fc3), Redshift 1.0.1007
and it does not seem to support this function yet.
ERROR: function array_agg(numeric) does not exist
HINT: No function matches the given name and argument types. You may need to add explicit type casts.
Query failed
PostgreSQL said: function array_agg(numeric) does not exist
Hint: No function matches the given name and argument types. You may need to add explicit type casts.
Is crosstab the type of transformation I should be looking at? Or something else? Thanks again.

I've used array_agg() here
select idx,array_agg(val)
from t
group by idx
This will produce result like below:
idx array_agg
--- --------------
25 {163698,41002}
50 {98862,21666}
75 {70326,14604}
95 {11184,51156}
100 {43788,10506}
As you can see the second column is an array of two values(column idx) that corresponding to column idx
The following select queries will give you result with two separate column
Method : 1
SELECT idx
,col [1] col1 --First value in the array
,col [2] col2 --Second vlaue in the array
FROM (
SELECT idx
,array_agg(val) col
FROM t
GROUP BY idx
) s
Method : 2
SELECT idx
,(array_agg(val)) [1] col1 --First value in the array
,(array_agg(val)) [2] col2 --Second vlaue in the array
FROM t
GROUP BY idx
Result:
idx col1 col2
--- ------ -----
25 163698 41002
50 98862 21666
75 70326 14604
95 11184 51156
100 43788 10506

You can use array_agg function. Asuming, your columns are named A,B,C:
SELECT B, array_agg(C)
FROM table_name
GROUP BY B
Will get you output in array form. This is as close as you can get to variable columns in a simple query. If you really need variable columns, consider defining a PL/pgSQL procedure to convert array into columns.

Related

bit varying in Postgres to be queried by sub-string pattern

The following Postgres table contains some sample content where the binary data is stored as bit varying (https://www.postgresql.org/docs/10/datatype-bit.html):
ID | Binary data
----------------------
1 | 01110
2 | 0111
3 | 011
4 | 01
5 | 0
6 | 00011
7 | 0001
8 | 000
9 | 00
10 | 0
11 | 110
12 | 11
13 | 1
Q: Is there any query (either native SQL or as Postgres function) to return all rows where the binary data field is equal to all sub-strings of the target bit array. To make it more clear lets look at the example search value 01101:
01101 -> no result
0110 -> no result
011 -> 3
01 -> 4
0 -> 5, 10
The result returned should contain the rows: 3, 4, 5 and 10.
Edit:
The working query is (thanks to Laurenz Albe):
SELECT * FROM table WHERE '01101' LIKE (table.binary_data::text || '%')
Furthermore I found this discussion about Postgres bit with fixed size vs bit varying helpful:
PostgreSQL Bitwise operators with bit varying "cannot AND bit strings of different sizes"
How about
WHERE '01101' LIKE (col2::text || '%')
I think you are looking for bitwise and:
where col2 & B'01101' = col2

How do I compare rows of a table against all other rows of the table?

I would like to create a script that takes the rows of a table which have a specific mathematical difference in their ASCII sum and to add the rows to a separate table, or even to flag a different field when they have that difference.
For instance, I am looking to find when the ASCII sum of word A and the ASCII sum of word B, both stored in rows of a table, have a difference of 63 or 31.
I could probably use a loop to select these rows, but SQL is not my greatest virtue.
ItemID | asciiSum |ProperDiff
-------|----------|----------
1 | 100 |
2 | 37 |
3 | 69 |
4 | 23 |
5 | 6 |
6 | 38 |
After running the code, the field ProperDiff will be updated to contain 'yes' for ItemID 1,2,3,5,6, since the AsciiSum for 1 and 2 (100-37) = 63 etc.
This will not be fast, but I think it does what you want:
update t
set ProperDiff = 'yes'
where exists (select 1
from t t2
where abs(t2.AsciiSum - t.AsciiSum) in (63, 31)
);
It should work okay on small tables.

Find referenced value of multiple columns

I have a table Setpoints which contains 3 columns Base,Effective and Actual which contains an id that refers to the item found in io.
I would like to make a query that will return the io_value found in the io table for the id found in Setpoints.
Currently my query will return multiple id's and then I query the io table to find the io_value for each id.
Ex Query returning the ID's in the row
row # | base | effective | actual
1 | 24 | 30 | 40
2 | 25 | 31 | 41
3 | 26 | 32 | 42
But i want it return the value instead of the id
Ex returning the value for the id's instead
row # | base | effective | actual
1 | 2.3 | 4.5 | 3.44
2 | 4.2 | 7.7 | 4.41
3 | 3.9 | 8.12 | 5.42
Here are the table fields
IO
io_value
io_id
Setpoints
stpt_base
stpt_effective
stpt_actual
Using postgres 9.5
What Im using now
SELECT * from setpoints
For each row
SELECT io_id, io_value
from io
where io_id in
(stpt_effective, stpt_actual, stpt_base);
// these are from previous query
You can solve this by joining the io table three times to the setpoints table, using the three columns in each individual JOIN:
SELECT a.io_value AS base,
b.io_value AS effective,
c.io_value AS actual
FROM setpoints s
JOIN io a ON a.io_id = s.stpt_base
JOIN io b ON b.io_id = s.stpt_effective
JOIN io c ON c.io_id = s.stpt_actual;

SQLAlchemy getting label names out from columns

I want to use the same labels from a SQLAlchemy table, to re-aggregate some data (e.g. I want to iterate through mytable.c to get the column names exactly).
I have some spending data that looks like the following:
| name | region | date | spending |
| John | A | .... | 123 |
| Jack | A | .... | 20 |
| Jill | B | .... | 240 |
I'm then passing it to an existing function we have, that aggregates spending over 2 periods (using a case statement) and groups by region:
grouped table:
| Region | Total (this period) | Total (last period) |
| A | 3048 | 1034 |
| B | 2058 | 900 |
The function returns a SQLAlchemy query object that I can then use subquery() on to re-query e.g.:
subquery = get_aggregated_data(original_table)
region_A_results = session.query(subquery).filter(subquery.c.region = 'A')
I want to then re-aggregate this subquery (summing every column that can be summed, replacing the region column with a string 'other'.
The problem is, if I iterate through subquery.c, I get labels that look like:
anon_1.region
anon_1.sum_this_period
anon_1.sum_last_period
Is there a way to get the textual label from a set of column objects, without the anon_1. prefix? Especially since I feel that the prefix may change depending on how SQLAlchemy decides to generate the query.
Split the name string and take the second part, and if you want to prepare for the chance that the name is not prefixed by the table name, put the code in a try - except block:
for col in subquery.c:
try:
print(col.name.split('.')[1])
except IndexError:
print(col.name)
Also, the result proxy (region_A_results) has a method keys which returns an a list of column names. Again, if you don't need the table names, you can easily get rid of them.

SQL group by one column, sort by another and transponse a third

I have the following table, which is actually the minimal example of the result of multiple joined tables. I now would like to group by 'person_ID' and get all the 'value' entries in one row, sorted after the feature_ID.
person_ID | feature_ID | value
123 | 1 | 1.1
123 | 2 | 1.2
123 | 3 | 1.3
123 | 4 | 1.2
124 | 1 | 1.0
124 | 2 | 1.1
...
The result should be:
123 | 1.1 | 1.2 | 1.3 | 1.2
124 | 1.0 | 1.1 | ...
There should exist an elegant SQL query solution, which I can neither come up with, nor find it.
For fast reconstruction that would be the example data:
create table example(person_ID integer, feature_ID integer, value float);
insert into example(person_ID, feature_ID, value) values
(123,1,1.1),
(123,2,1.2),
(123,3,1.3),
(123,4,1.2),
(124,1,1.0),
(124,2,1.1),
(124,3,1.2),
(124,4,1.4);
Edit: Every person has 6374 entries in the real life application.
I am using a PostgreSQL 8.3.23 database, but I think that should probably be solvable with standard SQL.
Data bases aren't much at transposing. There is a nebulous column growth issue at hand, I mean how does the data base deal with a variable number of columns? It's not a spread sheet.
This transposing of sorts is normally done in the report writer, not in SQL.
... or in a program, like in php.
Dynamic cross tab in sql only by procedure, see:
https://www.simple-talk.com/sql/t-sql-programming/creating-cross-tab-queries-and-pivot-tables-in-sql/