This is done in Microsoft SQL Server 2008 R2.
I'll start out with an example table.
Organization | MoneyAmount | MoneyAmountAvg
ISD | 500 |
ISD | 500 |
ISD | 500 |
QWE | 250 |
ISD | 500 |
QWE | 250 |
OLP | 800 |
ISD | 500 |
I need the MoneyAmountAvg column to have a value of MoneyAmount/(# of times that organization shows up
So for example, the MoneyAmountAvg column for the ISD rows would have a value of 100.
QWE would have 125 for each row in the MoneyAmountAvg column and OLP would have a value of 800 since it is there only once.
This is only an example table. The actual table is much bigger and has more organizations, but it has the same criteria. Some organizations have multiple rows, while others are there only once.
I just need a way for it to count how many times each organization is listed when I use an update statement for that organization's MoneyAmountAvg column. Hard coding it for each organization is definitely not an option since they can change at any moment.
Any help is appreciated.
Here is my answer:
select organization, moneyamount,
moneyamount / count(*) over (partition by organization)
from t
This is a simple application of a window function. I think most of the other answers are producing the overall average.
For an update statement, simply do:
with toupdate as (
select organization, moneyamount,
moneyamount / count(*) over (partition by organization) as newval
from t
)
update toupdate
set MoneyAmountAvg = newval
Try something like this:
;WITH CTE AS
(
SELECT
Org, Moneyamount,
MoneyAvg = AVG(MoneyAmount) OVER(PARTITION BY Org),
OrgCount = COUNT(*) OVER (PARTITION BY Org)
FROM
dbo.YourTableHere
)
SELECT DISTINCT Org, MoneyAmount, OrgCount, MoneyAvg / OrgCount
FROM CTE
That seems to return what you're looking for:
You can do this using analytic functions:
SELECT Organization,
MoneyAmount,
MoneyAmountAvg = MoneyAmount / COUNT(*) OVER(PARTITION BY Organization)
FROM T
Example on SQL Fiddle
SELECT (MIN(MoneyAmount) / COUNT(*)) AS AvgMoneyAmount
FROM your_table
GROUP BY Organization
This works if MoneyAmout for a single Organization is always equal.
But I'd say: Refactor the table:
Organization | MoneyAmount | count
----------------------------------
ISD | 500 | 5
QWE | 250 | 2
...
select example.Organization, example.MoneyAmount,
example.MoneyAmount / count(group_table.Organization) as MoneyAmountAvg
from example
inner join example group_table
on example.Organization = group_table.Organization
select
organization
,avg(MoneyAmount) / count(1)
from tab
group by organization
The purpouse of avg(Moneyamount) is to handle possible different values of amount for the same organization.
Related
I have a big data set in Redshift which my company will share with university students to analyze. I need to mask the real customer account numbers.
I've looked at the random function but there's one catch: some customers are repeated, so I need to retain that for the analysis to be useful. Also, with a random number there's a small possibility you would repeat account numbers, right?
How would you achieve this? Generate a new_random_id. It must be unique from all others in the table (there are over 4 million in the table), but must be the same for those rows where the actual account ID is the same.
+-------------------+---------------+---------+
| actual_accound_id | new_random_id | status |
+-------------------+---------------+---------+
| 100 | 123 | new |
| 100 | 123 | upgrade |
| 200 | 249 | new |
| 300 | 401 | upgrade |
+-------------------+---------------+---------+
I realize I could first generate a mapping table like this below, and then join to the main table, but it still doesn't solve the problem of possibly repeating new random IDs.
select distinct actual_account_id, cast(random()*1000000 as int) as new_random_id
into mapping_table
from t1;
I would create a mapping table using window functions:
select actual_account_id,
row_number() over (order by random()) as fake_account_id
from t1
group by actual_account_id;
This should be a meaningless sequential number.
Redshift might be a bit slow on the ROW_NUMBER() with no PARTITION BY. If performance is an issue, you can use something like this:
select actual_account_id,
count(*) * 100 + row_number(partition by tmp order by random()) as fake_acocunt_number
from (select actual_account_id,
cast(random()*1000000 as int) as tmp
from t1
group by actual_account_id
) t;
I'm having quite a bit of trouble figuring out exactly how to rearrange a table. I have a large table that looks something like this:
+--------+-----------+
| NAME | ACCOUNT # |
+--------+-----------+
| Nike | 87 |
| Nike | 12 |
| Adidas | 80 |
| Adidas | 21 |
+--------+-----------+
And I want to rearrange it to look like this:
+------+--------+
| Nike | Adidas |
+------+--------+
| 87 | 80 |
| 12 | 21 |
+------+--------+
But I can't seem to figure out how. I tried using PIVOT, but that only works with aggregate functions. I tried using a FOR LOOP as well, but couldn't get it work just right.
You can do this in several ways, but all being by enumerating the rows. Here is an example using conditional aggregation:
select max(case when name = 'Nike' then account end) as Nike,
max(case when name = 'Adidas' then account end) as Adidas
from (select t.*,
row_number() over (partition by name order by account desc) as seqnum
from t
) t
group by seqnum;
Consider again a pivot solution but first adding a rownumber for rolling Name group counts. Below assumes an autonumber ID field:
SELECT * FROM
(
SELECT Name, "Account #",
(ROW_NUMBER() OVER(PARTITION BY Name ORDER BY ID)) GrpRowNum
/* ALT: (SELECT Count(*) FROM Table1 sub
* WHERE sub.Name = Table1.Name AND sub.ID <= Table1.ID) GrpRowNum */
FROM Table1
)
PIVOT
(
SUM("Account #")
FOR Name IN ('Nike', 'Adidas')
)
ORDER BY RowNum;
However, for your ~200 items, you cannot easily render the Pivot's IN clause without various workarounds including PIVOT XML output or stored procedures with PL/SQL. Similarly, you could use general purpose coding (Java, PHP, Python, R) to retreive SELECT DISTINCT Name FROM Table1 resultset in vector/array, joining element values (collapsing or imploding arrays) with quotes and comma separators, and dropping the entire list in IN clause.
So I got this very inconsistent record for example(just an example):
Manager | Associate | FTE | Revenue
Bob | James | Y | 500
Bob | James | NULL | 100
Bob | James | Y | 200
Kelly | Rick | N | 200
Kelly | Rick | N | 500
Kelly | Rick | NULL | 300
So the goal i wanted was to Sum up the revenue, but the problem is in the group by the nulls kinda split them apart. So i want to write an update statement saying basically "well Looks like James and Bob are both FTE, so lets update that to Y and Kelly and rick are not so update that to no."
How can i fix this? Using MSAccess and of course my table is a lot biger with a lot of different name combos.
You can "impute" the value by using an aggregation function. The following query aggregates by manager/associate and takes the maximum value of fte. This is then joined back to the original data to do the calculation:
select ma.fte, sum(Revenue)
from table as t inner join
(select manager, associate, max(fte) as fte
from table as t
group by manager, associate
) as ma
on t.manager = ma.manager and
t.associate = ma.associate
group by ma.fte;
EDIT:
Immediately after posting this, I realized the join is not necessary. Two aggregations are sufficient:
select ma.fte, sum(Revenue)
from (select manager, associate, max(fte) as fte, sum(Revenue) as Revenue
from table as t
group by manager, associate
) as ma
group by ma.fte;
You haven't given the primary key columns, which makes it a bit harder. I've called it {id} below.
With the nulls, many SQL dialects have an "IfNull" function, but it seems MS-Access does not. You can get the same effect this way:
IIF(ISNULL(column),0,column)
You'd use that in a SELECT as so:
SELECT IIF(ISNULL(Revenue),0,Revenue) FROM ...
For a one-off fix you could do this:
UPDATE {table} SET Revenue=0 WHERE Revenue = NULL;
Doing a join to get the FTE from another row is more complex, and I don't have access handy to see just what the limits and syntax are. The easy to understand way is a nested query:
UPDATE {table} a SET FTE = (SELECT max(FTE) FROM {table} b WHERE FTE IS NOT NULL AND a.{id} = b.{id})
The max() function works here because it ignores nulls, where some other functions return null if you pass a null in.
i'm not really much into SQL & Apex, but i need one statement and I would really appreciate your help on this.
The syntax of Apex pie charts is this:
SELECT link, label, value
My table looks like these simple sketch:
+------+-----------+---------+
| ID | Company | Item |
+------+-----------+---------+
| 1 | AAA |Some |
| 2 | BB |Stuff |
| 3 | BB |Not |
| 4 | CCCC |Important|
| 5 | AAA |For |
| 6 | DDDDD |Question?|
+------+-----------+---------+
I want to show the percentage of the companies.
Problem: All companies with less than 5 items should combine to one colum "other". The difficulty for me is to combine the "unimportant" companies.
Until now my statement looks like this:
SELECT null link,
company label,
COUNT(ID) value FROM table HAVING COUNT(ID) > 5 GROUP BY company
Here a wonderful diagram-sketch. :D
Thank you for your ideas!
I've not got SQL Developer in front of me but this (or a close variation) should work for you:
WITH company_count
AS (
SELECT CASE
WHEN count(*) < 5
THEN 'Other'
ELSE company
END AS company_name,
id
FROM tablename
),
company_group
AS (
SELECT company_name,
count(id) item_count
FROM company_count
GROUP BY company_name
)
SELECT NULL AS link,
company_name AS label,
item_count AS value
FROM company_group;
Hope it helps!
Okay Guys, I found the answer for my use case. Its quite similar to Ollies answer. Thanks again for the help!
WITH sq1 AS (SELECT company, COUNT (*) AS count FROM
(SELECT CASE WHEN COUNT (*) OVER (Partition By company) > 5 THEN company
ELSE 'other' END AS company, id FROM table) GROUP BY company)
SELECT null link, company label, count value FROM sq1 ORDER BY count desc
The title was hard to word but the question is pretty simple. I searched all over here and could not find something for my specific issue so here it is. I'm usuing Microsoft SQL Server Management Studio 2010.
Table Currently looks like this
| Value | Product Name|
| 300 | Bike |
| 400 | Bike |
| 300 | Car |
| 300 | Car |
I need the table to show me the sum of Values where Product Name matches - like this
| TOTAL | ProductName |
| 700 | Bike |
| 600 | Car |
I've tried a simple
SELECT
SUM(Value) AS 'Total'
,ProductName
FROM TableX
But the above doesn't work. I end up getting the sum of all values in the column. How can I sum based on the product name matching?
Thanks!
SELECT SUM(Value) AS 'Total', [Product Name]
FROM TableX
GROUP BY [Product Name]
SQL Fiddle Example
Anytime you use an aggregate function, (SUM, MIN, MAX ... ) with a column in the SELECT statement, you must use GROUP BY. This is a group function that indicates which column to group the aggregate by. Further, any columns that are not in the aggregate cannot be in your SELECT statement.
For example, the following syntax is invalid because you are specifying columns (col2) which are not in your GROUP BY (even though MySQL allows for this):
SELECT col1, col2, SUM(col3)
FROM table
GROUP BY col1
The solution to your question would be:
SELECT ProductName, SUM(Value) AS 'Total'
FROM TableX
GROUP BY ProductName