How can I write pyspark code to get below output - dataframe

Input:
Buy-A
Date
Time
Qty
Per Share price
Total Value
15
10
10
10
100
15
14
20
10
200
Sell - B
Date
Time
Qty
Per Share price
Total Value
15
15
15
20
300
Output:
Date
Buy Time
Buy Qty
Buy Per Share price
Buy Total Value
Sell Qty
Sell Per Share Price
Sell Total Value
15
10
10
10
100
10
20
200
15
14
5
10
50
5
20
100
15
14
15
10
150
By using pyspark

Related

Is there a way in snowflake to get a table of 10 precentiles?

I have a table like that:
userID
time_spent
Dollars_Paid
date
1001
50
5
02/06/2022
1002
70
10
02/06/2022
1004
20
1
02/06/2022
1005
30
4
02/06/2022
and I want to calculate the 10,20,30,40,50...90,99 percentile of Time Spent, and then get the AVG dollars paid for each precentile.
lets say, something like this -
Precentile
time_spent
avg dollar
date
10
4
2.3
02/06/2022
20
20
4.5
02/06/2022
30
35
6.4
02/06/2022
40
50
6.2
02/06/2022
How can I do that?

How can I transfer the data from on column to another based on another column values in SQL

select Products, Fiscal_year, Fiscal_Period, Stock_QTY, DaysRemaining,
(Stock_QTY / DaysRemaining) as QtyforPeriod,
Stock_QTY -(Stock_QTY / DaysRemaining) as LeftforNextmonth
from Stocks
products| Fiscal_yaer| Fiscal_period| Stock_QTY |DaysReamain| QtyforPeriod |LeftforNextMonth
5000 22 1 100 4
6000 22 1 200 4
7000 22 2 300 20
7000 22 3 400 40
8000 23 1 500 60
5000 23 1 600 60
7000 23 2 700 90
8000 23 3 800 100
There is any possibility to write a query if the Fiscal_yae =22 Fiscal_period=4. Subtract StockTY - LeftforNextMonth of period 3 and divided by DaysRemaining.
Like if the Fiscal_yae =22 Fiscal_period=5. Subtract StockTY - LeftforNextMonth of period 4 and divided by days remaining.
Like if the Fiscal_yae =22 Fiscal_period=6. Subtract StockTY ( - ) LeftforNextMonth of period 5 and divided by days remaining.

Finding Max Price and displaying multiple columns SQL

I have a table that looks like this:
customer_id item price cost
1 Shoe 120 36
1 Bag 180 50
1 Shirt 30 9
2 Shoe 150 40
3 Shirt 30 9
4 Shoe 120 36
5 Shorts 65 14
I am trying to find the most expensive item each customer bought along with the cost of item and the item name.
I'm able to do the first part:
SELECT customer_id, max(price)
FROM sales
GROUP BY customer_id;
Which gives me:
customer_id price
1 180
2 150
3 30
4 120
5 65
How do I get this output to also show me the item and it's cost in the output? So output should look like this...
customer_id price item cost
1 180 Bag 50
2 150 Shoe 40
3 30 Shirt 9
4 120 Shoe 36
5 65 Shorts 14
I'm assuming its a Select statement within a Select? I would appreciate the help as I'm fairly new to SQL.
One method that usually has good performance is a correlated subquery:
select s.*
from sales s
where s.price = (select max(s2.price)
from sales s2
where s2.customer_id = s.customer_id
);

SQL. Split data over month based on expected hours

I really hope you can help me with this problem, which seems pretty complicated for me.
Dealid: DealprojectStartDate: expectedhours:
3534 2021-01-01 200
What I want is to split the weightamount out on different month in the future based on the expected number of hours.
I have following distribution key for expected hours:
0-500 = 2 month
500-1500 = 4 month
1500 - 4000 = 6 month
4000 - above = 8 month.
So forexample: in the above observation the start date is 01/01 and expected hours of 100 therefore weightamount should be split over 2 month -> Month 1 = 100 and Month 2 = 100.
Important note: If it is the first of the month then it should be allocated to that month. So for the above exapmle because it is the first of the monst (01/05) then it should be allocated to month 5 and 6, but if the start date wat 07/05 then i should have been allocated to month 6 and 7.
What i think would work if to get a new tabel that would split above observation into this:
Dealid: allocation date: expectedhours:
3534 2021-01-01(jan) 100
3534 2021-02-01(feb) 100
Hope you guy can help. Thanks
Consider the following table:
min max count month_number
0 500 2 1
0 500 2 2
500 1500 4 1
500 1500 4 2
500 1500 4 3
500 1500 4 4
1500 4000 6 1
1500 4000 6 2
1500 4000 6 3
1500 4000 6 4
1500 4000 6 5
1500 4000 6 6
4000 1000000000 8 1
4000 1000000000 8 2
4000 1000000000 8 3
4000 1000000000 8 4
4000 1000000000 8 5
4000 1000000000 8 6
4000 1000000000 8 7
4000 1000000000 8 8
If we call that table calc_lookup and your table atable then this query will give you the results you want
SELECT a.Dealid,
dateadd(month, dealprojectstartdate, l.month_number-1) as alocation_date,
a.expectedhours / l.count as expectedhours
FROM atable a
JOIN calc_lookup l on a.expectedhours between l.min and l.max
You don't give detail of the edge cases and such -- so there may be some changes needed (off by one, rounding, etc.)

Aggregate one measure by another in MS Analysis Services

There is a CUBE with two measures Prices, Volumes and dimension Hours. (The real CUBE is much complex, this is a simplified version.)
Hours Prices Volumes
0 0 100
0 10 20
0 20 300
0 40 100
0 50 50
1 0 500
1 20 50
1 25 200
1 40 30
1 50 10
How to aggregate Volumes by Prices and get next result (probably by using MDX query):
Prices Volumes
0 600
10 20
20 350
25 200
40 130
50 60