How does the reduce_agg function in presto work? - sql

the docs explain reduce_agg here: https://prestodb.io/docs/current/functions/aggregate.html
reduce_agg(inputValue T, initialState S, inputFunction(S, T, S), combineFunction(S, S, S)) → S
Reduces all input values into a single value. inputFunction will be invoked for each input value. In addition to taking the input value, inputFunction takes the current state, initially initialState, and returns the new state. combineFunction will be invoked to combine two states into a new state. The final state is returned:
SELECT id, reduce_agg(value, 0, (a, b) -> a + b, (a, b) -> a + b)
FROM (
VALUES
(1, 2),
(1, 3),
(1, 4),
(2, 20),
(2, 30),
(2, 40)
) AS t(id, value)
GROUP BY id;
-- (1, 9)
-- (2, 90)
SELECT id, reduce_agg(value, 1, (a, b) -> a * b, (a, b) -> a * b)
FROM (
VALUES
(1, 2),
(1, 3),
(1, 4),
(2, 20),
(2, 30),
(2, 40)
) AS t(id, value)
GROUP BY id;
-- (1, 24)
-- (2, 24000)
What exactly does the combineFunction argument do?
Changing the combineFunction to the following all result in the same output.
SELECT id, reduce_agg(value, 0, (a, b) -> a + b, (a, b) -> 0)
FROM (
VALUES
(1, 2),
(1, 3),
(1, 4),
(2, 20),
(2, 30),
(2, 40)
) AS t(id, value)
GROUP BY id;
-- (1, 9)
-- (2, 90)
SELECT id, reduce_agg(value, 0, (a, b) -> a + b, (a, b) -> b)
FROM (
VALUES
(1, 2),
(1, 3),
(1, 4),
(2, 20),
(2, 30),
(2, 40)
) AS t(id, value)
GROUP BY id;
-- (1, 9)
-- (2, 90)

Related

Multiindex columns to Index by concatenating column's name with int and strings [duplicate]

This question already has answers here:
How to flatten a hierarchical index in columns
(19 answers)
Closed last year.
This post was edited and submitted for review last year and failed to reopen the post:
Original close reason(s) were not resolved
I have my df.columns = this MultiIndex
MultiIndex([( 'F', 1),
( 'F', 2),
( 'F', 3),
( 'F', 4),
( 'F', 5),
( 'F', 6),
( 'F', 7),
( 'F', 8),
( 'F', 9),
( 'FG', 1),
( 'FG', 2),
( 'FG', 3),
( 'FG', 4),
( 'FG', 5),
( 'FG', 6),
( 'FG', 7),
( 'FG', 8),
( 'FG', 9),
( 'R', 1),
( 'R', 2),
( 'R', 3),
( 'R', 4),
( 'R', 5),
( 'R', 6),
( 'R', 7),
( 'R', 8),
( 'R', 9),
('RG', 1),
('RG', 2),
('RG', 3),
('RG', 4),
('RG', 5),
('RG', 6),
('RG', 7),
('RG', 8),
('RG', 9),
( 'FR', 1),
( 'FR', 2),
( 'FR', 3),
( 'FR', 4),
( 'FR', 5),
( 'FR', 6),
( 'FR', 7),
( 'FR', 8),
( 'FR', 9)],
names=[None, 'Profondeur'])
I want it to become Index like this :
Index(['F_1', 'F_2', 'F_3', 'F_4', 'F_5', 'F_6', 'F_7', 'F_8', 'F_9',
'FG_1', 'FG_2', 'FG_3', 'FG_4', 'FG_5', 'FG_6', 'FG_7', 'FG_8', 'FG_9',
'R_1', 'R_2', 'R_3', 'R_4', 'R_5', 'R_6', 'R_7', 'R_8', 'R_9',
'RG_1', 'RG_2', 'RG_3', 'RG_4', 'RG_5', 'RG_6', 'RG_7', 'RG_8', 'RG_9',
'FR_1', 'FR_2', 'FR_3', 'FR_4', 'FR_5', 'FR_6', 'FR_7', 'FR_8', 'FR_9'], dtype='object')
I don't know much about MultiIndex and I obtained it after an unstack() method. In fact I want to combine both level concatenating their columns names with '_' as a separator. Also 'Profondeur' label is int and maybe it needs to be turned to str
Collapse a MultiIndex into a simple Index by assigning the columns back to a list. In this case, use a list comprehension to '_'.join the two components of the MultiIndex (the values are simple tuples), with an extra step of mapping to strings because you can't string join a string with an int.
df.columns = ['_'.join(map(str, x)) for x in df.columns]
print(df.columns)
#Index(['F_1', 'F_2', 'F_3', 'F_4', 'F_5', 'F_6', 'F_7', 'F_8', 'F_9', 'FG_1',
# 'FG_2', 'FG_3', 'FG_4', 'FG_5', 'FG_6', 'FG_7', 'FG_8', 'FG_9', 'R_1',
# 'R_2', 'R_3', 'R_4', 'R_5', 'R_6', 'R_7', 'R_8', 'R_9', 'RG_1', 'RG_2',
# 'RG_3', 'RG_4', 'RG_5', 'RG_6', 'RG_7', 'RG_8', 'RG_9', 'FR_1', 'FR_2',
# 'FR_3', 'FR_4', 'FR_5', 'FR_6', 'FR_7', 'FR_8', 'FR_9'],
# dtype='object')

Julia - JuMP constraints with array of tuples as indexes

The following is working in Julia JuMP:
#variable(m, δ[i=V, j=tildeV ; i != j && ŷ[i] < .5 && s[i,j] == sim_i[i]] >= 0)
While
Ω = [(i,j) for i in V, j in tildeV if i != j && ŷ[i] < .5 && s[i,j] == sim_i[i]]
#variable(m, δ[(i,j)=Ω] >= 0)
Results an error:
ERROR: UndefVarError: i not defined
What am I doing wrong? I could not find in the documentation. I tried: δ[(i,j)...=Ω] and δ[(i,j)=Ω...]
You might want to use a simpler Ω for minimal example like
Ω = [(i,j) for i in 1:5, j in 1:5]
These all work, so there must be a problem elsewhere in your code.
julia> S = [(1,2), (3,4)]
2-element Vector{Tuple{Int64, Int64}}:
(1, 2)
(3, 4)
julia> model = Model()
A JuMP Model
Feasibility problem with:
Variables: 0
Model mode: AUTOMATIC
CachingOptimizer state: NO_OPTIMIZER
Solver name: No optimizer attached.
julia> #variable(model, x[(i, j) = S])
1-dimensional DenseAxisArray{VariableRef,1,...} with index sets:
Dimension 1, [(1, 2), (3, 4)]
And data, a 2-element Vector{VariableRef}:
x[(1, 2)]
x[(3, 4)]
julia> Ω = [(i,j) for i in 1:5, j in 1:5]
5×5 Matrix{Tuple{Int64, Int64}}:
(1, 1) (1, 2) (1, 3) (1, 4) (1, 5)
(2, 1) (2, 2) (2, 3) (2, 4) (2, 5)
(3, 1) (3, 2) (3, 3) (3, 4) (3, 5)
(4, 1) (4, 2) (4, 3) (4, 4) (4, 5)
(5, 1) (5, 2) (5, 3) (5, 4) (5, 5)
julia> model = Model()
A JuMP Model
Feasibility problem with:
Variables: 0
Model mode: AUTOMATIC
CachingOptimizer state: NO_OPTIMIZER
Solver name: No optimizer attached.
julia> #variable(model, x[(i, j) = Ω])
1-dimensional DenseAxisArray{VariableRef,1,...} with index sets:
Dimension 1, [(1, 1) (1, 2) … (1, 4) (1, 5); (2, 1) (2, 2) … (2, 4) (2, 5); … ; (4, 1) (4, 2) … (4, 4) (4, 5); (5, 1) (5, 2) … (5, 4) (5, 5)]
And data, a 25-element Vector{VariableRef}:
x[(1, 1)]
x[(2, 1)]
x[(3, 1)]
x[(4, 1)]
x[(5, 1)]
x[(1, 2)]
x[(2, 2)]
x[(3, 2)]
x[(4, 2)]
x[(5, 2)]
⋮
x[(2, 4)]
x[(3, 4)]
x[(4, 4)]
x[(5, 4)]
x[(1, 5)]
x[(2, 5)]
x[(3, 5)]
x[(4, 5)]
x[(5, 5)]

Pure PostgreSQL replacement for PL/R sample() function?

Our new database does not (and will not) support PL/R usage, which we rely on extensively to implement a random weighted sample function:
CREATE OR REPLACE FUNCTION sample(
ids bigint[],
size integer,
seed integer DEFAULT 1,
with_replacement boolean DEFAULT false,
probabilities numeric[] DEFAULT NULL::numeric[])
RETURNS bigint[]
LANGUAGE 'plr'
COST 100
VOLATILE
AS $BODY$
set.seed(seed)
ids = as.integer(ids)
if (length(ids) == 1) {
s = rep(ids,size)
} else {
s = sample(ids,size, with_replacement,probabilities)
}
return(s)
$BODY$;
Is there a purely SQL approach to this same function? This post shows an approach that selects a single random row, but does not have the functionality of sampling multiple groups at once.
As far as I know, SQL Fiddle does not support PLR, so see below for a quick replication example:
CREATE TABLE test
(category text, uid integer, weight numeric)
;
INSERT INTO test
(category, uid, weight)
VALUES
('a', 1, 45),
('a', 2, 10),
('a', 3, 25),
('a', 4, 100),
('a', 5, 30),
('b', 6, 20),
('b', 7, 10),
('b', 8, 80),
('b', 9, 40),
('b', 10, 15),
('c', 11, 20),
('c', 12, 10),
('c', 13, 80),
('c', 14, 40),
('c', 15, 15)
;
SELECT category,
unnest(diffusion_shared.sample(array_agg(uid ORDER BY uid),
1,
1,
True,
array_agg(weight ORDER BY uid))
) as uid
FROM test
WHERE category IN ('a', 'b')
GROUP BY category;
Which outputs:
category uid
'a' 4
'b' 8
Any ideas?

Deriving the first instance of a specific node type in a tree structure query using SQL

I am designing an Electrical design software that will model an electrical utility system from the incoming Power Utility right down to the individual circuits such as computers and coffee machines.
I want to give each component of the system a dedicated table. eg. Transformers, Loads, cables, PowerPanels (called buses in this example).
Each component can be connected to one or many other components. I am using a parent/child table to manage the connections and plan to use a CTE to derive the hierarchical tree structure for a given component.
The voltage supplying any component in the system will be derived by finding the first instance of a transformer or a utility in the tree.
I have developed a query that can handle this as demonstrated below.
However, this only works for selecting one component in the CTE. I am looking for a way to select all buses and their connected voltage (nearest trafo or Utility). The only solution I can come up with is to use a table function on the above query. Is there a better way of doing this.
CREATE TABLE #componentConnection
(componentConnectionID int, parentComponentID varchar(4), childComponentID int)
;
INSERT INTO #componentConnection
(componentConnectionID, parentComponentID, childComponentID)
VALUES
(1, '13', 18),
(2, '13', 19),
(3, '13', 20),
(4, '13', 21),
(5, '13', 22),
(6, '13', 23),
(7, '14', 24),
(8, '14', 25),
(9, '14', 26),
(10, '14', 27),
(11, '14', 28),
(12, '14', 29),
(13, '15', 30),
(14, '15', 31),
(15, '15', 32),
(16, '15', 33),
(17, '15', 34),
(18, '15', 35),
(19, '16', 36),
(20, '16', 37),
(21, '16', 38),
(22, '16', 39),
(23, '16', 40),
(24, '16', 41),
(25, '1', 5),
(27, '5', 13),
(28, NULL, 1),
(29, '18', 6),
(30, '6', 11),
(31, '11', 7),
(32, '7', 14)
;
CREATE TABLE #component
(componentID int, componentName varchar(8), componentType varchar(7))
;
INSERT INTO #component
(componentID, componentName, componentType)
VALUES
(1, 'Utility1', 'utility'),
(2, 'Utility2', 'utility'),
(3, 'utility3', 'utility'),
(4, 'utility4', 'utility'),
(5, 'Cable1', 'cable'),
(6, 'Cable2', 'cable'),
(7, 'Cable3', 'cable'),
(8, 'Cable4', 'cable'),
(9, 'Cable5', 'cable'),
(10, 'Cable6', 'cable'),
(11, 'Trafo1', 'trafo'),
(12, 'Trafo2', 'trafo'),
(13, 'Bus1', 'bus'),
(14, 'Bus2', 'bus'),
(15, 'Bus3', 'bus'),
(16, 'Bus4', 'bus'),
(17, 'Bus5', 'bus'),
(18, 'cub1', 'cir'),
(19, 'cub2', 'cir'),
(20, 'cub3', 'cir'),
(21, 'cub4', 'cir'),
(22, 'cub5', 'cir'),
(23, 'cub6', 'cir'),
(24, 'cub1', 'cir'),
(25, 'cub2', 'cir'),
(26, 'cub3', 'cir'),
(27, 'cub4', 'cir'),
(28, 'cub5', 'cir'),
(29, 'cub6', 'cir'),
(30, 'cub1', 'cir'),
(31, 'cub2', 'cir'),
(32, 'cub3', 'cir'),
(33, 'cub4', 'cir'),
(34, 'cub5', 'cir'),
(35, 'cub6', 'cir'),
(36, 'cub1', 'cir'),
(37, 'cub2', 'cir'),
(38, 'cub3', 'cir'),
(39, 'cub4', 'cir'),
(40, 'cub5', 'cir'),
(41, 'cub6', 'cir')
;
CREATE TABLE #utility
([utilityID] int, [componentID] int, [utlityKV] float)
;
INSERT INTO #utility
([utilityID], [componentID], [utlityKV])
VALUES
(1, 1, 0.4),
(2, 2, 0.208),
(4, 3, 0.48),
(5, 4, 0.208)
;
CREATE TABLE #transformer
([transformerID] int, [componentID] int, [facilityID] int, [transformerName] varchar(4), [transformerPrimaryTapKv] float, [transformerSecondaryTapKv] float, [transformerPrimaryKv] float, [transformerSecondaryKv] float)
;
INSERT INTO #transformer
([transformerID], [componentID], [facilityID], [transformerName], [transformerPrimaryTapKv], [transformerSecondaryTapKv], [transformerPrimaryKv], [transformerSecondaryKv])
VALUES
(3, 11, 1, NULL, 0.48, 0.208, 0.48, 0.208),
(4, 12, 2, NULL, 0.48, 0.4, 0.48, 0.4)
;
CREATE TABLE #Bus
([busID] int, [busTypeID] int, [componentID] int, [bayID] int, [busName] varchar(4), [busConductorType] varchar(6), [busRatedCurrent] int)
;
INSERT INTO #Bus
([busID], [busTypeID], [componentID], [bayID], [busName], [busConductorType], [busRatedCurrent])
VALUES
(8, 1, 13, 1, 'bus1', 'Copper', 60),
(9, 1, 14, 1, 'bus2', 'copper', 50),
(10, 2, 15, 1, 'bus3', 'copper', 35),
(11, 2, 16, 1, 'bus4', 'copper', 35),
(13, 1, 17, 1, 'bus5', 'copper', 50)
;
WITH CTE AS (SELECT childComponentID AS SourceID, childComponentID, 0 AS depth
FROM #ComponentConnection
UNION ALL
SELECT C1.SourceID, C.childComponentID, c1.depth + 1 AS depth
FROM #ComponentConnection AS C INNER JOIN
CTE AS C1 ON C.parentComponentID = C1.childComponentID)
SELECT childComponentID,b.busName, min(depth)
--,c.componentType
,isnull(t.transformerSecondaryKv,u.utlityKV) kV
FROM CTE AS CTE1
join #Component c
on CTE1.SourceID = c.componentID
left join #Utility u
on CTE1.SourceID = u.componentID
left join #Transformer t
on CTE1.SourceID = t.componentID
LEFT JOIN #Bus b
on cte1.childComponentID = b.componentID
where busName is not null and c.componentType in ('Utility','trafo')
group by childComponentID,b.busName,isnull(t.transformerSecondaryKv,u.utlityKV)
order by depth
The desired result would be as follows for a Bus. I want to list all Buses and their associated Voltage. I would select all from the Bus table and derive the voltage from the heirarchical structure
Result
BusName | Voltage
Bus 1 | 0.4
Bus 2 | 0.208
Bus 3 | etc

Convert pandas Series/DataFrame to numpy matrix, unpacking coordinates from index

I have a pandas series as so:
A 1
B 2
C 3
AB 4
AC 5
BA 4
BC 8
CA 5
CB 8
Simple code to convert to a matrix as such:
1 4 5
4 2 8
5 8 3
Something fairly dynamic and built in, rather than many loops to fix this 3x3 problem.
You can do it this way.
import pandas as pd
# your raw data
raw_index = 'A B C AB AC BA BC CA CB'.split()
values = [1, 2, 3, 4, 5, 4, 8, 5, 8]
# reformat index
index = [(a[0], a[-1]) for a in raw_index]
multi_index = pd.MultiIndex.from_tuples(index)
df = pd.DataFrame(values, columns=['values'], index=multi_index)
df.unstack()
df.unstack()
Out[47]:
values
A B C
A 1 4 5
B 4 2 8
C 5 8 3
For pd.DataFrame uses .values member or else .to_records(...) method
For pd.Series use .unstack() method as Jianxun Li said
import numpy as np
import pandas as pd
d = pd.DataFrame(data = {
'var':['A','B','C','AB','AC','BA','BC','CA','CB'],
'val':[1,2,3,4,5,4,8,5,8] })
# Here are some options for converting to np.matrix ...
np.matrix( d.to_records(index=False) )
# matrix([[(1, 'A'), (2, 'B'), (3, 'C'), (4, 'AB'), (5, 'AC'), (4, 'BA'),
# (8, 'BC'), (5, 'CA'), (8, 'CB')]],
# dtype=[('val', '<i8'), ('var', 'O')])
# Here you can add code to rearrange it, e.g.
[(val, idx[0], idx[-1]) for val,idx in d.to_records(index=False) ]
# [(1, 'A', 'A'), (2, 'B', 'B'), (3, 'C', 'C'), (4, 'A', 'B'), (5, 'A', 'C'), (4, 'B', 'A'), (8, 'B', 'C'), (5, 'C', 'A'), (8, 'C', 'B')]
# and if you need numeric row- and col-indices:
[ (val, 'ABCDEF...'.index(idx[0]), 'ABCDEF...'.index(idx[-1]) ) for val,idx in d.to_records(index=False) ]
# [(1, 0, 0), (2, 1, 1), (3, 2, 2), (4, 0, 1), (5, 0, 2), (4, 1, 0), (8, 1, 2), (5, 2, 0), (8, 2, 1)]
# you can sort by them:
sorted([ (val, 'ABCDEF...'.index(idx[0]), 'ABCDEF...'.index(idx[-1]) ) for val,idx in d.to_records(index=False) ], key=lambda x: x[1:2] )