i have script
LOAD * INLINE [
Document, Date, Itam, Line, Saldo, Unit
Doc1, 12.12.2015, Item1, 1, 10, m
Doc1, 3.04.2015, Item2, 2, 15, kg
Doc2, 11.09.2015, Item1, 1, 6, kg
Doc3, 11.04.2015, Item1, 1, 13, m
Doc3, 21.03.2015, Item5, 2, 45, l
Doc4, 20.04.2016, Item4, 1, 7, l
Doc5, 12.01.2016, Item1, 1, 13, kg
Doc6, 7.09.2016, Item3, 1, 16, m
Doc6, 21.09.2016, Item3, 2, 21, kg
Doc7, 1.01.2016, Item2, 1, 3, l
];
I want to get sum of Saldo by year in script. How can I modify script to get something like this:
Load
Year2015,
SumOfSaldo2015,
Year2016,
SumOfSaldo2016
And when I call SumOfSaldo2015 in text box to get sum of saldo for year 2015
It seems a slightly strange thing to do when you should be able to achieve it via using dimensions in QlikView - just create a year Dimension based on the date...
LOAD *,
right(Date,4) as Year;
LOAD * INLINE [
Document, Date, Itam, Line, Saldo, Unit
Doc1, 12.12.2015, Item1, 1, 10, m
Doc1, 3.04.2015, Item2, 2, 15, kg
Doc2, 11.09.2015, Item1, 1, 6, kg
Doc3, 11.04.2015, Item1, 1, 13, m
Doc3, 21.03.2015, Item5, 2, 45, l
Doc4, 20.04.2016, Item4, 1, 7, l
Doc5, 12.01.2016, Item1, 1, 13, kg
Doc6, 7.09.2016, Item3, 1, 16, m
Doc6, 21.09.2016, Item3, 2, 21, kg
Doc7, 1.01.2016, Item2, 1, 3, l
];
Or if you're after just a consolidated view...
Temp:
LOAD *,
right(Date,4) as Year;
LOAD * INLINE [
Document, Date, Itam, Line, Saldo, Unit
Doc1, 12.12.2015, Item1, 1, 10, m
Doc1, 3.04.2015, Item2, 2, 15, kg
Doc2, 11.09.2015, Item1, 1, 6, kg
Doc3, 11.04.2015, Item1, 1, 13, m
Doc3, 21.03.2015, Item5, 2, 45, l
Doc4, 20.04.2016, Item4, 1, 7, l
Doc5, 12.01.2016, Item1, 1, 13, kg
Doc6, 7.09.2016, Item3, 1, 16, m
Doc6, 21.09.2016, Item3, 2, 21, kg
Doc7, 1.01.2016, Item2, 1, 3, l
];
Saldo:
LOAD
Year AS Year,
sum(Saldo) as SumOfSaldo
RESIDENT TEMP
GROUP BY Year;
However if you really needed to do it, you could do something like...
Temp:
LOAD *,
right(Date,4) as Year;
LOAD * INLINE [
Document, Date, Itam, Line, Saldo, Unit
Doc1, 12.12.2015, Item1, 1, 10, m
Doc1, 3.04.2015, Item2, 2, 15, kg
Doc2, 11.09.2015, Item1, 1, 6, kg
Doc3, 11.04.2015, Item1, 1, 13, m
Doc3, 21.03.2015, Item5, 2, 45, l
Doc4, 20.04.2016, Item4, 1, 7, l
Doc5, 12.01.2016, Item1, 1, 13, kg
Doc6, 7.09.2016, Item3, 1, 16, m
Doc6, 21.09.2016, Item3, 2, 21, kg
Doc7, 1.01.2016, Item2, 1, 3, l
];
Saldo:
LOAD
Year AS Year2015,
sum(Saldo) as SumOfSaldo2015
RESIDENT TEMP
GROUP BY Year
WHERE Year = 2015;
CONCATENATE(Saldo)
LOAD
Year AS Year2016,
sum(Saldo) as SumOfSaldo2016
RESIDENT TEMP
GROUP BY Year
WHERE Year = 2016;
DROP TABLE Temp;
Related
I'm trying to create an empty pandas.Dataframe with a Multi-Index that I can later fill columnwise with my data. I've looked at other answers (here and here), but they all work with data that does not fill in columnwise, or that is somehow connected in the different columns.
The information I want to be contained in the Multi-Index looks like this:
GCM_list = ['BCC-CSM2-MR', 'CAMS-CSM1-0', 'CESM2', 'CESM2-WACCM', 'CMCC-CM2-SR5', 'EC-Earth3', 'EC-Earth3-Veg', 'FGOALS-f3-L', 'GFDL-ESM4', 'INM-CM4-8', 'INM-CM5-0', 'MPI-ESM1-2-HR', 'MRI-ESM2-0', 'NorESM2-MM', 'TaiESM1']
SSP_list = ['SSP_126', 'SSP_245', 'SSP_370', 'SSP_585']
index_years = [2030, 2040, 2050, 2060, 2070, 2080, 2090, 2100]
And I want it to look somewhat like this (for the three first items in GCM_list):
BCC-CSM2-MR CAMS-CSM1-0 CESM2
SSP_126 SSP_245 SSP_370 SSP_585 SSP_126 SSP_245 SSP_370 SSP_585 SSP_126 SSP_245 SSP_370 SSP_585
2030 | |
2040 | |
2050 V V
2060 1 2
2070
2080
2090
2100
The "arrows" in the first two columns should represent how and in what order I want to fill the Dataframe after the Index is created - if that's important for this question.
I've tried building the index like this, but I'm not sure what to make of the result. How should I proceed? Is there a way to build this empty dataframe so that I can fill it column after column?
arrays = [GCM_list, SSP_list]
index = pd.MultiIndex.from_arrays(arrays, names=('GCM', 'SSP'))
>>> index
MultiIndex(levels=[[u'BCC-CSM2-MR', u'CAMS-CSM1-0', u'CESM2', u'CESM2-WACCM', u'CMCC-CM2-SR5', u'EC-Earth3', u'EC-Earth3-Veg', u'FGOALS-f3-L', u'GFDL-ESM4', u'INM-CM4-8', u'INM-CM5-0', u'MPI-ESM1-2-HR', u'MRI-ESM2-0', u'NorESM2-MM', u'TaiESM1'], [u'SSP_126', u'SSP_245', u'SSP_370', u'SSP_585']],
labels=[[0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 6, 6, 6, 6, 7, 7, 7, 7, 8, 8, 8, 8, 9, 9, 9, 9, 10, 10, 10, 10, 11, 11, 11, 11, 12, 12, 12, 12, 13, 13, 13, 13, 14, 14, 14, 14], [0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3]],
names=[u'GCM', u'SSP'])
Use MultiIndex.from_product:
arrays = [GCM_list, SSP_list]
mux = pd.MultiIndex.from_product(arrays, names=('GCM', 'SSP'))
df = pd.DataFrame(columns=mux, index=index_years)
Edit: example DataFrame for the original error-message found and posted.
(As I just recognized, the Error does only appear, if the tuple has a certain length. The example is now adapted.)
Original text:
I need to group by tuple of different length.
For the grouping I'm applying a summary_function.
import pandas as pd
def summary_function(df):
value_mean = df['value'].mean()
df1 = pd.DataFrame({'value_mean':[value_mean]
})
return df1
tuple_list = [(1,2,1,1,1,1,1,1,1,1,1,1,1),(2,3,1,1,1,1,1,1,1,1,1,1,1), \
(1,2,1,1,1,1,1,1,1,1,1,1,1), \
(2,3,4,4,4,4,4,4,4,4,4,4,4,4,4,1,1,1,1,1,1,1,1,1,1,1)]
value = [1,2,3,4]
letter = list('abab')
df = pd.DataFrame({'letter':letter, 'tuple':tuple_list, 'value':value})
df
> letter tuple value
>0 a (1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) 1
>1 b (2, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) 2
>2 a (1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) 3
>3 b (2, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, ... 4
If I'm using a direct mean() function, the result is how expected:
df.groupby(['letter','tuple']).mean()
> value
>letter tuple
>a (1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) 2
>b (2, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) 2
> (2, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, ...) 4
But if I apply the function. (which I need to use since I have dozens of summaries) The tupel is empty while using the simple
df.groupby(['letter','tuple']).apply(lambda x:summary_function(x))
I get a ValueError:
>ValueError: Values not found in passed level: MultiIndex([(2, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4)],
)
It would be awesome to get some ideas on how to solve this.
In your case, do not return the dataframe, return the series.
When you return the series, Pandas will align the series horizontally. For example:
def summary_function(df):
return df['value'].agg(['min','mean','max'])
df.groupby(['letter','tuple']).apply(summary_function)
Output:
value min mean max
letter tuple
a (1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) 1.0 2.0 3.0
b (2, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) 2.0 2.0 2.0
(2, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 1... 4.0 4.0 4.0
The even shorter solution was just to replace "pd.DataFrame" with "pd.Series".
def summary_function(df):
value_mean = df['value'].mean()
df1 = pd.Series({'value_mean':[value_mean]
})
(Inspired by the answer of Quang Hoang)
I have a SQL Server table with data on various factories (plants), with rows identified by a root plant ID, and a sub plant ID. The root ID is the same for the facility for its entire life. And the sub ID is added each time the plant data is changed with the regulatory agency.
Sometimes when the plant data was re-filed with the regulator, only the changed data was submitted, and other fields were left blank (Null).
I'm looking for an elegant way to write a query that will return all of the data from the most recent sub ID record, except that for Capacity, it will pull the most recent sub for which a non-Null Capacity was actually specified.
Assume that these are the fields in the Plant table:
RecordId (primary key)
RootId
SubId
Fuel
Capacity
Here is the SQL for selecting the data for the most recent SubId:
SELECT p1.* FROM Plant as p1
WHERE
p1.SubId = (
SELECT TOP 1 p2.SubId FROM Plant as p2
WHERE p1.RootId = p2.RootId
ORDER BY p2.SubId DESC)
I've been thinking about this for a while, but haven't come up with an approach. Even just a push in the right direction would be appreciated. Here is some SQL code to generate sample data:
CREATE TABLE Plant (
RecordId INTEGER PRIMARY KEY,
RootId VARCHAR(12) not null,
SubID INTEGER not null,
Fuel INTEGER not null,
Capacity DECIMAL(10,4)
);
INSERT INTO Plant
VALUES
(451, 'PLT03-39', 3, 1, 4399.67),
(471, 'PLT03-39', 4, 1, 4399.67),
(1809, 'PLT03-39', 5, 1, 4399.67),
(4888, 'PLT03-39', 6, 1, Null),
(6111, 'PLT03-39', 7, 1, Null),
(450, 'PLT03-40', 3, 1, 15531.67),
(472, 'PLT03-40', 4, 1, Null),
(1810, 'PLT03-40', 5, 1, 14767.61),
(4882, 'PLT03-40', 6, 1, Null),
(6113, 'PLT03-40', 7, 1, Null),
(454, 'PLT03-41', 5, 1, 23726.34),
(455, 'PLT03-41', 6, 1, 23726.34),
(469, 'PLT03-41', 7, 1, 23726.34),
(1807, 'PLT03-41', 8, 1, 22850.96),
(4884, 'PLT03-41', 9, 1, 22850.96),
(6110, 'PLT03-41', 10, 1, 22850.96),
(452, 'PLT03-42', 3, 1, 9120.65),
(470, 'PLT03-42', 4, 1, Null),
(1808, 'PLT03-42', 5, 1, 9120.65),
(4883, 'PLT03-42', 6, 1, 9120.65),
(6109, 'PLT03-42', 7, 1, Null),
(449, 'PLT03-43', 4, 1, 7923.96),
(474, 'PLT03-43', 5, 1, 7923.96),
(1811, 'PLT03-43', 6, 1, 7357.24),
(4881, 'PLT03-43', 7, 1, Null),
(5107, 'PLT03-43', 7, 1, 7711.44),
(5133, 'PLT03-43', 7, 1, Null),
(6112, 'PLT03-43', 8, 1, 7711.44),
(98, 'PLT05-25', 2, 18, 26.565),
(528, 'PLT05-25', 2, 18, 26033.7),
(139, 'PLT05-25', 2, 18, 26565),
(380, 'PLT05-25', 2, 18, Null),
(381, 'PLT05-25', 2, 18, 51854.88),
(7398, 'PLT06-143', 0, 18, 4091.01),
(4112, 'PLT06-143', 1, 18, 4091.01),
(5309, 'PLT06-143', 2, 18, 4091.01),
(73982, 'PLT06-143', 2, 18, 4091.01),
(73981, 'PLT06-143', 3, 18, Null),
(7397, 'PLT06-145', 0, 18, 4091.01),
(73971, 'PLT06-145', 1, 18, 4091.01),
(4109, 'PLT06-145', 1, 18, Null),
(5314, 'PLT06-145', 2, 18, 4091.01),
(73972, 'PLT06-145', 2, 18, Null),
(73973, 'PLT06-145', 3, 18, 4091.01),
(177, 'PLT06-342', 2, 1, 35420),
(1307, 'PLT06-342', 3, 1, 30360),
(5946, 'PLT06-342', 4, 1, 30360),
(6220, 'PLT06-342', 5, 1, Null),
(13264, 'PLT06-342', 6, 1, Null),
(1312, 'PLT06-344', 2, 1, 15180),
(5106, 'PLT06-344', 3, 1, 15180),
(5945, 'PLT06-344', 4, 1, 15180),
(6218, 'PLT06-344', 5, 1, Null),
(10550, 'PLT06-344', 6, 1, 10120),
(13271, 'PLT06-344', 7, 1, 10120),
(2724, 'PLT06-87', 2, 6, 143.451),
(5039, 'PLT06-87', 3, 6, 143.451),
(5886, 'PLT06-87', 4, 6, Null),
(10586, 'PLT06-87', 5, 6, 143.451),
(22759, 'PLT06-87', 6, 6, Null),
(158, 'PLT07-234', 1, 18, 21274.77),
(341, 'PLT07-234', 2, 18, 21274.77),
(7813, 'PLT07-234', 3, 18, 21274.77),
(24562, 'PLT07-234', 4, 18, Null),
(24584, 'PLT07-234', 4, 18, 2488.508),
(5965, 'PLT07-328', 2, 1, 19607.5),
(6073, 'PLT07-328', 2, 1, 19607.5),
(5996, 'PLT07-328', 2, 1, 19607.5),
(6644, 'PLT07-328', 3, 1, 19607.5),
(6701, 'PLT07-328', 3, 1, Null),
(7664, 'PLT07-328', 4, 1, Null),
(227, 'PLT07-39', 2, 18, 50347),
(1269, 'PLT07-39', 3, 18, 50258.45),
(1821, 'PLT07-39', 4, 18, 50258.45),
(1976, 'PLT07-39', 4, 18, 50258.45),
(5282, 'PLT07-39', 5, 18, Null),
(374, 'PLT08-25', 2, 18, 55331.1),
(135, 'PLT08-25', 2, 18, 30.36),
(134, 'PLT08-25', 2, 18, 56.925),
(533, 'PLT08-25', 2, 18, 55.7865),
(93, 'PLT08-25', 2, 18, 56.925),
(4081, 'PLT08-437', 1, 18, 5206.74),
(4241, 'PLT08-437', 2, 18, 5206.74),
(4242, 'PLT08-437', 3, 18, 5206.74),
(4532, 'PLT08-437', 4, 18, 4946.656),
(24344, 'PLT08-437', 5, 18, Null),
(460, 'PLT10-574', 0, 18, 198207.284),
(943, 'PLT10-574', 2, 18, 198207.284),
(1248, 'PLT10-574', 3, 18, 198207.284),
(2371, 'PLT10-574', 4, 18, 198207.284),
(6173, 'PLT10-574', 5, 18, 198207.284),
(17787, 'PLT10-574', 6, 18, 198207.284),
(23533, 'PLT10-574', 7, 18, 198207.284)
;
And here is the expected result of the query I'm seeking:
RecordId RootId SubId Fuel Capacity
6111 PLT03-39 7 1 4399.67
6113 PLT03-40 7 1 14767.61
6110 PLT03-41 10 1 22850.96
6109 PLT03-42 7 1 9120.65
6112 PLT03-43 8 1 7711.44
381 PLT05-25 2 18 51854.88
7398 PLT06-143 3 18 4091.01
7397 PLT06-145 3 18 4091.01
13264 PLT06-342 6 1 30360
13271 PLT06-344 7 1 10120
22759 PLT06-87 6 6 143.451
24584 PLT07-234 4 18 2488.508
7664 PLT07-328 4 1 19607.5
5282 PLT07-39 5 18 50258.45
93 PLT08-25 2 18 56.925
24344 PLT08-437 5 18 4946.656
23533 PLT10-574 7 18 198207.284
Below is one solution to this problem. I used a CTE and MAX aggregate to determine the latest RecordId for each RootId. After joining that back to the Plant table used an OUTER APPLY to retrieve the most recent capacity.
WITH LATEST AS
(
SELECT RootId, MAX(RecordId) AS RecordId
FROM Plant
GROUP BY RootId
)
SELECT
P.RecordId
, P.RootId
, P.SubID
, P.Fuel
, CAP.Capacity
FROM
LATEST AS L
JOIN Plant AS P
ON L.RecordId = P.RecordId
OUTER APPLY
(
SELECT TOP 1 Capacity
FROM Plant
WHERE RootId = P.RootId AND Capacity IS NOT NULL
ORDER BY SubID DESC
) AS CAP
ORDER BY
L.RootId
enter image description hereenter image description hereI have a large dataframe with muliple columns and rows. They are grouped by geographic location and date. The problem is I have too many columns with dates. I think I need to further develop this dataframe so that I have: "GeographyCode", "Number of Awards", "Secondary School Stage", "SCQF Level" and "DateCode" as single rows. I do not know if my data can be used for Scikit Learn Linear Regression. Please help.
pivot02.columns
MultiIndex(levels=[['Number Of Awards', 'SCQF Level', 'Secondary School Stage'], ['2002/2003', '2003/2004', '2004/2005', '2005/2006', '2006/2007', '2007/2008', '2008/2009', '2009/2010', '2010/2011', '2011/2012', '2012/2013']],
labels=[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]],
names=[None, 'DateCode'])
I have successfuly grouped geographic location, Number Of Awards', 'SCQF Level', 'Secondary School Stage. But the final output is a multi index which I do not know if I can use in linear regression. Is this ok for machine learning?
I'm building a graph which allows edges to be toggled on/off. I need to be able to add and remove them repeatedly. I have noticed this error with node degrees with nodes attached to toggled edges. I've included an example.
My code:
allElements = cy.elements();
....
var allEdges = allElements.filter('edge');
var allNodes = allElements.filter('node');
for(var i=0; i<5; i++){
// DELETE
var printThis = [];
allNodes.filter(function(i,ele){
printThis.push(ele.degree());
});
console.log(printThis);
cy.remove(allEdges);
cy.add(allEdges);
}
Returns:
[1, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 6, 1, 2, 1, 1, 1, 36, 8, 3, 4, 4, 2, 1, 1, 1, 1, 1, 1, 2]
[1, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 6, 1, 2, 1, 1, 1, 36, 8, 3, 4, 4, 2, 1, 1, 1, 1, 1, 1, 2]
[2, 2, 2, 2, 2, 6, 2, 2, 2, 2, 2, 12, 2, 4, 2, 2, 2, 72, 16, 6, 8, 8, 4, 2, 2, 2, 2, 2, 2, 4]
[3, 3, 3, 3, 3, 9, 3, 3, 3, 3, 3, 18, 3, 6, 3, 3, 3, 108, 24, 9, 12, 12, 6, 3, 3, 3, 3, 3, 3, 6]
[4, 4, 4, 4, 4, 12, 4, 4, 4, 4, 4, 24, 4, 8, 4, 4, 4, 144, 32, 12, 16, 16, 8, 4, 4, 4, 4, 4, 4, 8]
Which shows that removing edges after the first time dont decrease the degree of the nodes they're attached to.
How can I have cytoscape return the correct degree?
Thank you for notifying us of the issue. We will get a fix in for 2.0.3 -M
https://github.com/cytoscape/cytoscape.js/issues/360