I have a table in the below format:
Pan_no
ANA_Code
R_units
R_price
absolute_returns
BBJ
Equity
1.5
500
15000
AAX
Debt
2.0
1500
3000
EDF
Debt
3.0
500
-91
Like the above sample data i have 10,000,000 records available. Now I require another column were i need to divide absolute_returns columns into bins(groups) and put them into 5 buckets based on the values like 1,2,3,4,5 then i need to find sum(r_price),sum(r_units) which is then grouped by pan_no, ana_code, and bins(this bins is the new column that will be created).
I tried to achieve the above with the below code:
select
pan_no, ana_code,
sum(r_units), sum(r_price),
ntile(5) over (order by absolute_returns) as bins
from
table1
group by
pan_no, ana_code, bins;
What am I missing in my code? I am just trying to create 5 bins for absolute_returns column and then summing up the r_price and r_units and then trying to group the data by pan_no, ana_code and bins. But the code doesn't work.
I am guessing you are using SQL server;
select
pan_no, ana_code,
sum(r_units), sum(r_price),
ntile(5) OVER(PARTITION BY pan_no,ana_code, ORDER BY absolute_returns ASC) as bins
from
table1
group by
pan_no, ana_code;
Change asc to desc if needed.
Related
DATA Explanation
I have two data tables, one (PAGE VIEWS) which represents user events (CV 1,2,3 etc) and associated timestamp with member ID. The second table (ORDERS) represents the orders made - event time & order value. Membership ID is available on each table.
Table 1 - PAGE VIEWS (1,000 Rows in Total)
Event_Day
Member ID
CV1
CV2
CV3
CV4
11/5/2021
115126
APP
camp1
Trigger
APP-camp1-Trigger
11/14/2021
189192
SEARCH
camp4
Search
SEARCH-camp4-Search
11/5/2021
193320
SEARCH
camp5
Search
SEARCH-camp5-Search
Table 2 - ORDERS (249 rows in total)
Date
Purchase Order ID
Membership Number
Order Value
7/12/2021
0088
183300
29.34
18/12/2021
0180
132159
132.51
4/12/2021
0050
141542
24.35
What I'm trying to answer
I'd like to attribute the CV columns (PAGE VIEWS) with the (ORDERS) order value, by the earliest event date in (PAGE VIEWS). This would be a simple attribution use case.
Visual explanation of the two data tables
Issues
I've spent the weekend result and scrolling through a variety of online articles but the closest is using the following query
Select min (event_day) As "first date",member_id,cv2,order_value,purchase_order_id
from mta_app_allpages,mta_app_orders
where member_id = membership_number
group by member_id,cv2,order_value,purchase_order_id;
The resulting data is correct using the DISTINCT function as Row 2 is different to Row 1, but I'd like to associate the result to Row 1 for member_id 113290, and row 3 for member_id 170897 etc.
Date
member_id
cv2
Order Value
2021-11-01
113290
camp5
58.81
2021-11-05
113290
camp4
58.51
2021-11-03
170897
camp3
36.26
2021-11-09
170897
camp5
36.26
2021-11-24
170897
camp1
36.26
Image showing the results table
I've tried using partition and sub query functions will little success. The correct call should return a maximum of 249 rows as that is as many rows as I have in the ORDERS table.
First-time poster so hopefully I have the format right. Many thanks.
Using RANK() is the best approach:
select * from
(
select *, RANK()OVER(partition by membership_number order by Event_Day) as rnk
from page_views as pv
INNER JOIN orders as o
ON pv.Member_ID=o.Membership_Number
) as q
where rnk=1
This will only fetch the minimum event_day.
However, you can use MIN() to achieve the same (but with complex sub-query):
select *
from
(select pv.*
from page_views as pv
inner join
(
select Member_ID, min(event_day) as mn_dt
from page_views
group by member_id
) as mn
ON mn.Member_ID=pv.Member_ID and mn.mn_dt=pv.event_day
)as sq
INNER JOIN orders as o
ON sq.Member_ID=o.Membership_Number
Both the queries will get us the same answer.
See the demo in db<>fiddle
I created a select query as following, now I need to get the total count of the "No.of Ideas generated" column in a separate row as total which will have a count of the individual count of particular idea_sector and idea_industry combination.
Query:
select c.idea_sector,c.idea_industry,
count(*) as "No.of Ideas generated"
from hackathon2k21.consolidated_report c
group by idea_sector,idea_industry
order by idea_sector ,idea_industry
Output:
----------------------------------------------------------------------
idea_sector idea_industry No.of Ideas generated
-----------------------------------------------------------------------
COMMUNICATION-ROC TELECOMMUNICATIONS 1
Cross Sector Cross Industry 5
DISTRIBUTION TRAVEL AND TRANSPORTATION 1
FINANCIAL SERVICES BANKING 1
PUBLIC HEALTHCARE 1
Required output:
----------------------------------------------------------------------
idea_sector idea_industry No.of Ideas generated
-----------------------------------------------------------------------
COMMUNICATION-ROC TELECOMMUNICATIONS 1
Cross Sector Cross Industry 5
DISTRIBUTION TRAVEL AND TRANSPORTATION 1
FINANCIAL SERVICES BANKING 1
PUBLIC HEALTHCARE 1
------------------------------------------------------------------------
Total 9
You can accomplish this with grouping sets. That's where we tell postgres, in the GROUP BY clause, all of the different ways we would like to see our result set grouped for the aggregated column(s)
SELECT
c.idea_sector,
c.idea_industry,
count(*) as "No.of Ideas generated"
FROM hackathon2k21.consolidated_report c
GROUP BY
GROUPING SETS (
(idea_sector,idea_industry),
())
ORDER BY idea_sector ,idea_industry;
This generates two grouping sets. One that groups by idea_sector, idea_industry granularity like in your existing sql and another that groups by nothing, essentially creating a full table Total.
The easiest way seems to be adding a UNION ALL operator like this:
select c.idea_sector,c.idea_industry,
count(*) as "No.of Ideas generated"
from hackathon2k21.consolidated_report c
group by idea_sector,idea_industry
--order by idea_sector ,idea_industry
UNION ALL
SELECT 'Total', NULL, COUNT(*)
from hackathon2k21.consolidated_report
Here's an easy one. I have a sales table that looks like this:
store_id industry_code sales_person_1 sales_person_2 ... sales_person_n
1 1000 20.75 15.50 ... 100
2 2000 15.54 16.84 ... 125
Suppose I want to find out which quantile sales_person_2 falls into for store_id=1. I know I can use a window function ntile(5) OVER(PARTITION BY ____ ORDER BY SUM(__) DESC) to divide a column into 5 buckets and use that to identify which bucket an arbitrary value falls into. What's the best way to do that across columns rather than within a column?
What you can do is explode your columns into several rows:
select t.store_id,
t.industry_code,
s.val
from test_table t
lateral view explode(array(sales_person_1, sales_person_2, ..., sales_person_n)) s as val
and only then use ntile.
See the example from the Hive docs.
I hope my title is ok as I really don’t know how to call it.
Anyway, I have a table with the following :
ID - Num (Primary Key)
Category - VarChar
Name - VarChar
DateForName - Date
Data looks like that :
1 100 111 31/12/2017
2 101 210 30/12/2017
3 100 112 29/12/2017
4 101 203 27/12/2017
5 100 117 20/12/2017
6 103 425 08/12/2017
To generate this table, I just sorted by date DESC.
Is there a way to add a new column with the order per Category like :
1 100|1
2 101|1
3 100|2
4 101|2
5 100|3
6 103|1
Max
You want analytical function row_number():
select t.*
from (select *, row_number() over (partition by Category order by date desc) Seq
from table
) t
order by id;
Yes, SQL has a couple options for you to add a column that is populated with a ranking of the rows based on the category and id columns.
If you just want to add a column to the select statement, I recommend using the RANK() function.
See more details here:
https://learn.microsoft.com/en-us/sql/t-sql/functions/rank-transact-sql?view=sql-server-2017
For your current table, try the following select statement:
SELECT
[ID],
[Category],
[Name],
[DateForName],
RANK() OVER (PARTITION BY [Category] ORDER BY [DateForName] DESC) AS [CategoryOrder]
FROM [TableName]
Alternatively, if you want to add a permanent column (aka a field) to the existing table, I recommend treating this as a calculated column. See more information here:
https://learn.microsoft.com/en-us/sql/relational-databases/tables/specify-computed-columns-in-a-table?view=sql-server-2017
Because the new column would be completely based on two pre-existing columns and only those two columns. SQL can do a great job maintaining this for you.
Hope this helps!
So i have a table that has a set of information like this
name Type PRICE
11111 XX 0.001
22222 YY 0.002
33333 ZZ 0.0001
11111 YY 0.021
11111 ZZ 0.0111
77777 YY 0.1
77777 ZZ 1.2
Now these numbers go on for about a million rows and there could be upwards of 20 of the same 'name' mapping to 20 different TYPE. But there will only be 1 unique type per name. What I mean by this is that 11111 could have XX,YY,ZZ on it but it cannot have YY,ZZ,YY on it.
What I need is to get the lowest 3 prices and what TYPE they are per name.
Right now I can get the lowest price per name by doing:
select name, type, min(price) from table group by name;
However that is just for the lowest price but I need the lowest 3 prices. I've been trying for a couple days and I cant seem to get it. All help is appreciated.
Also, please let me know if I forgot any information, i'm still trying to figure out stack overflow :P
Oh and the database is a noSQL that uses SQL syntax.
edit: I can't seem to get the format down for my example data from my table to show correctly
If your database supports window functions, and allowing for the possibility that there may be more than three rows in your data with any of the three lowest prices, this should do it:
select the_table.*
from
the_table
inner join (
select name, price
from (
select name, price, row_number() over(partition by name order by price) as rn
from the_table) as x
where rn < 4
) as y on y.name=the_table.name and y.price=the_table.price;