how to get te avarege of values in a table - kql

Im trying to get some information out of log analytcs and I want to know if I can extract the avarege of values fron different lines
for exemple, lets say I have a table that goes like this:
.create table custumer (name: string, month: int, salary: long, living: long)
.ingest inline into table customer<
gabriel, 1, 1000, 500
gabriel, 2, 1000, 800
gabriel, 3, 2500, 800
gabriel, 4, 2500, 800
John, 1, 1500, 1000
John, 2, 1500, 500
John, 3, 1500, 500
John, 4, 1500, 1200
jina, 1, 3000, 1000
jina, 2, 3000, 1000
jina, 3, 3000, 1500
jina, 4, 5000, 2500
here we have the simplest possible table to explain my inquire, we're listing the salary and living expenses of each custumer per month (namely month 1, month 2, 3 and 4)
Then I want to kow the avarege salary and living expenses of gabriel, John and Jina in this period of 4 months
the actual query I want to aply this is a tad more complicated but this is enogh to explain my problem

I think this is what you are looking for:
datatable(name:string, month:int, salary:long, living:long)[
'gabriel', 1, 1000, 500,
'gabriel', 2, 1000, 800,
'gabriel', 3, 2500, 800,
'gabriel', 4, 2500, 800,
'John', 1, 1500, 1000,
'John', 2, 1500, 500,
'John', 3, 1500, 500,
'John', 4, 1500, 1200,
'jina', 1, 3000, 1000,
'jina', 2, 3000, 1000,
'jina', 3, 3000, 1500,
'jina', 4, 5000, 2500]
| summarize Avg_Salary=avg(salary), Avg_Expenses=avg(living) by name
Result:
name Avg_Salary Avg_Expenses
gabriel 1750 725
John 1500 800
jina 3500 1500

Related

Get point geometries as rows for each vertex in SDO_GEOMETRY line

Oracle 18c:
It's possible to get SDO_GEOMETRY line vertex ordinates as rows using the sdo_util.getvertices() function:
with cte as (
select 100 as asset_id, sdo_geometry('linestring (10 20, 30 40)') shape from dual union all
select 200 as asset_id, sdo_geometry('linestring (50 60, 70 80, 90 100)') shape from dual union all
select 300 as asset_id, sdo_geometry('linestring (110 120, 130 140, 150 160, 170 180)') shape from dual)
select
cte.asset_id,
id as vertex_id,
v.x,
v.y
from
cte, sdo_util.getvertices(shape) v
ASSET_ID VERTEX_ID X Y
---------- ---------- ---------- ----------
100 1 10 20
100 2 30 40
200 1 50 60
200 2 70 80
200 3 90 100
300 1 110 120
300 2 130 140
300 3 150 160
300 4 170 180
The resulting rows have columns with ordinates as numbers.
I want to do something similar, but I want to get point geometries as rows for each vertex in the lines, instead of numbers.
The result would look like this:
ASSET_ID VERTEX_ID SHAPE
---------- ---------- ----------------
100 1 [SDO_GEOMETRY]
100 2 [SDO_GEOMETRY]
200 1 [SDO_GEOMETRY]
200 2 [SDO_GEOMETRY]
200 3 [SDO_GEOMETRY]
300 1 [SDO_GEOMETRY]
300 2 [SDO_GEOMETRY]
300 3 [SDO_GEOMETRY]
300 4 [SDO_GEOMETRY]
Idea:
There is an undocumented function called SDO_UTIL.GET_COORDINATE(geometry, point_number).
(The name of that function seems misleading: it returns a point geometry, not a coordinate.)
select
cte.asset_id,
sdo_util.get_coordinate(shape,1) as first_point
from
cte
ASSET_ID FIRST_POINT
---------- ---------------------
100 [MDSYS.SDO_GEOMETRY]
200 [MDSYS.SDO_GEOMETRY]
300 [MDSYS.SDO_GEOMETRY]
That function could be useful for getting vertices as point geometries.
Question:
Is there a way to get point geometries as rows for each vertex in the SDO_GEOMETRY lines?
If you want the output as an MDSYS.ST_POINT data type then convert the MDSYS.SDO_GEOMETRY type to an MDSYS.ST_LINESTRING type and use the ST_NumPoints() and ST_PointN(index) member functions (from the MDSYS.ST_CURVE super-type) in a LATERAL joined hierarchical sub-query:
with cte (asset_id, shape) as (
select 100, sdo_geometry('linestring (10 20, 30 40)') from dual union all
select 200, sdo_geometry('linestring (50 60, 70 80, 90 100)') from dual union all
select 300, sdo_geometry('linestring (110 120, 130 140, 150 160, 170 180)') from dual
)
select c.asset_id,
p.point
from cte c
CROSS JOIN LATERAL (
SELECT ST_LINESTRING(c.shape).ST_PointN(LEVEL) AS point
FROM DUAL
CONNECT BY LEVEL <= ST_LINESTRING(c.shape).ST_NumPoints()
) p;
db<>fiddle here
Try...
with cte as (
select 100 as asset_id, sdo_geometry('linestring (10 20, 30 40)') shape from dual union all
select 200 as asset_id, sdo_geometry('linestring (50 60, 70 80, 90 100)') shape from dual union all
select 300 as asset_id, sdo_geometry('linestring (110 120, 130 140, 150 160, 170 180)') shape from dual
)
select
c.asset_id,
id as vertex_id,
sdo_geometry(c.shape.sdo_gtype/10 * 10+1,
c.shape.sdo_srid,
sdo_point_type(v.x, v.y, v.z),
null,null) as point
from
cte c, sdo_util.getvertices(shape) v
I came up with a cross join and connect by level solution that seems to work.
Although, there might be more succinct ways of doing it.
with
data as (
select 100 as asset_id, sdo_geometry('linestring (10 20, 30 40)') shape from dual union all
select 200 as asset_id, sdo_geometry('linestring (50 60, 70 80, 90 100)') shape from dual union all
select 300 as asset_id, sdo_geometry('linestring (110 120, 130 140, 150 160, 170 180)') shape from dual),
vertices as (
select level as vertex_index from dual connect by level <= (select max(sdo_util.getnumvertices(shape)) from data))
select
d.asset_id,
v.vertex_index,
sdo_util.get_coordinate(d.shape,v.vertex_index) as sdo_geom_point, --the ordinates are stored in the SDO_GEOMETRY's SDO_POINT attribute. Example: MDSYS.SDO_POINT_TYPE(10, 20, NULL)
sdo_util.get_coordinate(d.shape,v.vertex_index).sdo_point.x as x,
sdo_util.get_coordinate(d.shape,v.vertex_index).sdo_point.y as y
from
data d
cross join
vertices v
where
v.vertex_index <= sdo_util.getnumvertices(shape)
order by
asset_id,
vertex_index
Result:
ASSET_ID VERTEX_INDEX SDO_GEOM_POINT X Y
---------- ------------ -------------------- ---------- ----------
100 1 [MDSYS.SDO_GEOMETRY] 10 20
100 2 [MDSYS.SDO_GEOMETRY] 30 40
200 1 [MDSYS.SDO_GEOMETRY] 50 60
200 2 [MDSYS.SDO_GEOMETRY] 70 80
200 3 [MDSYS.SDO_GEOMETRY] 90 100
300 1 [MDSYS.SDO_GEOMETRY] 110 120
300 2 [MDSYS.SDO_GEOMETRY] 130 140
300 3 [MDSYS.SDO_GEOMETRY] 150 160
300 4 [MDSYS.SDO_GEOMETRY] 170 180
I added the X & Y columns to the query to show what the [MDSYS.SDO_GEOMETRY] values represent. I don't actually need the X&Y columns in my query.
Edit:
I borrowed #MT0's cross join lateral technique and adapted it for SDO_GEOMETRY instead of MDSYS.ST_POINT.
It's cleaner than my original cross join / connect by level approach.
with cte (asset_id, shape) as (
select 100, sdo_geometry('linestring (10 20, 30 40)') from dual union all
select 200, sdo_geometry('linestring (50 60, 70 80, 90 100)') from dual union all
select 300, sdo_geometry('linestring (110 120, 130 140, 150 160, 170 180)') from dual
)
select c.asset_id,
vertex_index,
p.point,
sdo_util.get_coordinate(c.shape,p.vertex_index).sdo_point.x as x,
sdo_util.get_coordinate(c.shape,p.vertex_index).sdo_point.y as y
from cte c
cross join lateral (
select sdo_util.get_coordinate(c.shape,level) as point, level as vertex_index
from dual
connect by level <= sdo_util.getnumvertices(c.shape)
) p;
The result is the same:
ASSET_ID VERTEX_INDEX SDO_GEOM_POINT X Y
---------- ------------ -------------------- ---------- ----------
100 1 [MDSYS.SDO_GEOMETRY] 10 20
100 2 [MDSYS.SDO_GEOMETRY] 30 40
200 1 [MDSYS.SDO_GEOMETRY] 50 60
200 2 [MDSYS.SDO_GEOMETRY] 70 80
200 3 [MDSYS.SDO_GEOMETRY] 90 100
300 1 [MDSYS.SDO_GEOMETRY] 110 120
300 2 [MDSYS.SDO_GEOMETRY] 130 140
300 3 [MDSYS.SDO_GEOMETRY] 150 160
300 4 [MDSYS.SDO_GEOMETRY] 170 180

Pandas - Finding percent contributed by each group

I am trying to find the percentage contribution made by each date group. Given below is how my data looks like.
Expecting to find contribution of each product for a given date.
date, product, quantity
2020-01, prod_a, 100
2020-01, prod_b, 200
2020-01, prod_c, 20
2020-01, prod_d, 50
2020-02, prod_a, 30
2020-02, prod_b, 30
2020-02, prod_c, 40
My expected output would be as below:
date, product, quantity, prct_contributed
2020-01, prod_a, 100, 27%
2020-01, prod_b, 200, 54%
2020-01, prod_c, 20, 5%
2020-01, prod_d, 50, 14%
2020-02, prod_a, 30, 30%
2020-02, prod_b, 30, 30%
2020-02, prod_c, 40, 40%
Use groupby().transform():
df['quantity'] / df.groupby('date')['quantity'].transform('sum')

Pandas Dynamically Remove Row

I have a data set which contains account_number, date, balance, interest charged, and code. This is accounting data so transactions are posted and then reversed if they're was a mistake by the data provider so things can be posted and reversed multiple times.
Account_Number Date Balance Interest Charged Code
0012 01/01/2017 1,000,000 $ 50.00 Posted
0012 01/05/2017 1,000,000 $-50.00 Reversed
0012 01/07/2017 1,000,000 $ 50.00 Posted
0012 01/10/2017 1,000,000 $-50.00 Reversed
0012 01/15/2017 1,000,000 $50.00 Posted
0012 01/17/2017 1,500,000 $25.00 Posted
0012 01/18/2017 1,500,000 $-25.00 Reversed
Looking at the data set above- I am trying to figure out a way to look at every row by account number and balance and if they're is a inverse charge it should remove both of those rows and only keep a charge if they're is no corresponding reversal for it (01/15/2017). For example on 01/01/2017 a charge of 50.00 dollar was posted on a balance of 1,000,000 and on 01/05/2017 the charged was reversed on the same balance -- so both of these rows should be thrown out. This is the same case for 01/07 and 01/10.
I am not to sure on how to code out this problem - any ideas or tips would be great!
So the problem with a question like this is that there are many corner cases. Optimizing for them many or many not depend on how the data is already processed. That being said, here is one solution. Assuming -
For each Account number and balance, the for for each Reversed transaction is just after the corresponding payment.
>>import pandas as pd
>>from datetime import date
>>df = pd.DataFrame(data = [
['0012', date(2017, 1, 1), 1000000, 50, 'Posted'],['0012', date(2017, 1, 5), 1000000, -50, 'Reversed'],
['0012', date(2017, 1, 7), 1000000, 50, 'Posted'],['0012', date(2017, 1, 10), 1000000, -50, 'Reversed'],
['0012', date(2017, 1, 15), 1000000, 50, 'Posted'],['0012', date(2017, 1, 17), 1500000, 25, 'Posted'],
['0012', date(2017, 1, 18), 1500000, -25, 'Reversed'],],
columns=['Account_Number', 'Date', 'Balance', 'Interest Charged', 'Code'])
>>df
Account_Number Date Balance Interest Charged Code
0 0012 2017-01-01 1000000 50 Posted
1 0012 2017-01-05 1000000 -50 Reversed
2 0012 2017-01-07 1000000 50 Posted
3 0012 2017-01-10 1000000 -50 Reversed
4 0012 2017-01-15 1000000 50 Posted
5 0012 2017-01-17 1500000 25 Posted
6 0012 2017-01-18 1500000 -25 Reversed
>> def f(df_g):
idx = df_g[df_g['Code'] == 'Reversed'].index
return df_g.loc[~df_g.index.isin(idx.union(idx-1)), ['Date', 'Interest Charged', 'Code']]
>>df.groupby(['Account_Number', 'Balance']).apply(f).reset_index().loc[:, df.columns]
Account_Number Date Balance Interest Charged Code
0 0012 2017-01-15 1000000 50 Posted
How it works - Basically for each combination of Account Number and Balance, I look at the Rows with Reversed, and I remove them plus the row just before it.
EDIT: - To make it slightly more Robust (now it picked up the last row based on Amount, Balance and account_number:
>>df = pd.DataFrame(data = [
['0012', date(2017, 1, 1), 1000000, 53, 'Posted'],['0012', date(2017, 1, 7), 1000000, 50, 'Posted'],['0012', date(2017, 1, 5), 1000000, -50, 'Reversed'],
['0012', date(2017, 1, 10), 1000000, -53, 'Reversed'],
['0012', date(2017, 1, 15), 1000000, 50, 'Posted'],['0012', date(2017, 1, 17), 1500000, 25, 'Posted'],
['0012', date(2017, 1, 18), 1500000, -25, 'Reversed'],],
columns=['Account_Number', 'Date', 'Balance', 'Interest Charged', 'Code'])
>>df
Account_Number Date Balance Interest Charged Code
0 0012 2017-01-01 1000000 53 Posted
1 0012 2017-01-07 1000000 50 Posted
2 0012 2017-01-05 1000000 -50 Reversed
3 0012 2017-01-10 1000000 -53 Reversed
4 0012 2017-01-15 1000000 50 Posted
5 0012 2017-01-17 1500000 25 Posted
6 0012 2017-01-18 1500000 -25 Reversed
>>output_cols = df.columns
>>df['ABS_VALUE'] = df['Interest Charged'].abs()
>>def f(df_g):
df_g = df_g.reset_index() # Added this new line
idx = df_g[df_g['Code'] == 'Reversed'].index
return df_g.loc[~df_g.index.isin(idx.union(idx-1)), ['Date', 'Interest Charged', 'Code']]
>>df.groupby(['Account_Number', 'Balance', 'ABS_VALUE']).apply(f).reset_index().loc[:, output_cols]
Account_Number Date Balance Interest Charged Code
0 0012 2017-01-15 1000000 50 Posted

multilevel columns set as index in pivot_table

I have a data frame (df) with multi column headers:
enter image description here
yearQ YearC YearS Type1 Type2
index City State Year1 Year2 Year3 Year4 Year5 Year6
0 New York NY 355 189 115 234 178 422
1 Los Angeles CA 100 207 298 230 214 166
2 Chicago IL 1360 300 211 121 355 435
3 Philadelphia PA 270 156 455 232 532 355
4 Phoenix AZ 270 234 112 432 344 116
I want to count the average number for each type. the final format should be like the following:
City State Type1 Type2
New York NY avg of(355+189+115) avg of (234+178+422)
.......
Can anybody give me a hint?
Many thanks.
Kath
I think you can use groupby by first level of Multindex in columns with aggregate sum:
print (df.index)
MultiIndex(levels=[[0, 1, 2, 3, 4],
['Chicago', 'Los Angeles', 'New York', 'Philadelphia', 'Phoenix'],
['AZ', 'CA', 'IL', 'NY', 'PA']],
labels=[[0, 1, 2, 3, 4], [2, 1, 0, 3, 4], [3, 1, 2, 4, 0]])
print (df.columns)
MultiIndex(levels=[['Type1', 'Type2'],
['Year1', 'Year2', 'Year3', 'Year4', 'Year5', 'Year6']],
labels=[[0, 0, 0, 1, 1, 1], [0, 1, 2, 3, 4, 5]],
names=['YearQ', 'index'])
df = df.groupby(axis=1, level=0).sum()
print (df)
YearQ Type1 Type2
0 New York NY 659 834
1 Los Angeles CA 605 610
2 Chicago IL 1871 911
3 Philadelphia PA 881 1119
4 Phoenix AZ 616 892
But maybe before is necessary set_index:
print (df.index)
Int64Index([0, 1, 2, 3, 4], dtype='int64')
print (df.columns)
MultiIndex(levels=[['Type1', 'Type2', 'YearC', 'YearS'],
['City', 'State', 'Year1', 'Year2', 'Year3', 'Year4', 'Year5', 'Year6']],
labels=[[2, 3, 0, 0, 0, 1, 1, 1], [0, 1, 2, 3, 4, 5, 6, 7]],
names=['YearQ', 'index'])
df = df.set_index([('YearC','City'), ('YearS','State')])
df = df.groupby(axis=1, level=0).sum()
print (df)
YearQ Type1 Type2
(YearC, City) (YearS, State)
New York NY 659 834
Los Angeles CA 605 610
Chicago IL 1871 911
Philadelphia PA 881 1119
Phoenix AZ 616 892

Selecting Periods from Pricetable

What i am trying to do is to select a price which is valid from a range for a product.
Table:
create table #tblPreis
(
[Date] date,
[Layer] int,
[Site] int,
[item] int,
[price] money
)
insert into #tblPreis
values
('2015-08-19', 1, 1, 10, 0,90),
('2015-08-18', 1, 1, 10, 0,50),
('2015-08-17', 1, 1, 10, 0,50),
('2015-08-16', 1, 1, 10, 2,00),
('2015-08-15', 2, 1, 10, 2,00),
('2015-08-14', 1, 1, 10, 1,00),
('2015-08-19', 3, 1, 12, 3,00),
('2015-08-18', 1, 1, 9, 7,00),
('2015-08-17', 1, 1, 9, 8,00),
('2015-08-16', 1, 1, 9, 8,00),
('2015-08-15', 1, 1, 9, 8,00),
('2015-01-01', 1, 1, 9, 7,00);
What I would like to receive is
DateStart DateEnd Item Price
2015-01-01 2015-08-14 9 7,00
2015-08-15 2015-08-17 9 8,00
2015-08-18 2015-08-18 9 7,00
2015-08-19 2015-08-19 12 3,00
2015-08-14 2015-08-14 10 1,00
2015-08-15 2015-08-16 10 2,00
2015-08-17 2015-08-18 10 0,50
2015-08-19 2015-08-19 10 0,90
As you can see the Layer and other columns beside date, item and price don't interest me.
Thank you for any advice.
Dimi