SQL AVG function - sql

What is the correct syntax to display the data from all records using GROUP (or DISTINCT) and AVG?
My data looks like this:
TABLE NAME: Pensje
COLUMNS:
ID, Company, Position, Salary
Example:
ID Company Position Salary
1 Atari Designer 24000
2 Atari Designer 20000
3 Atari Programmer 35000
4 Amiga Director 40000
I need to arrange data in this way (only records from 1 company need to be displayed)
Position a , average Salary from all the records with same Company and Position
Position b , average Salary from all the records with same Company and Position
Ex. Atari
Designer, 22000
Programmer, 35000
My SQL looks like this:
SELECT Position, AVG(Salary)
FROM Pensje
WHERE Company = %s
GROUP BY Position
ORDER BY Position ASC
In the above example "Position" is displayed correctly, "Salary" is not displayed at all, while after removing AVG() is displayed but only first position found in table
Many thanks for taking your time to help me!

Posting an answer from MCP_infiltrator which helped:
In your WHERE clause you should be using WHERE Company LIKE '%s' not = The way you have it written, should produce the results you want so I do not understand why it is not working properly for you. You may also want to alias your AVG() results like AVG(Salary) AS Average_Salary otherwise your column will comback with no name.

Related

How to grouby data in one column and distribute it in another column in HiveSQL?

I have the following data:
CompanyID
Department
No of People
Country
45390
HR
100
UK
45390
Service
250
UK
98712
Service
300
US
39284
Admin
142
Norway
85932
Admin
260
Germany
I wish to know how many people belong to the same department from different countries?
Required Output
Department
No of People
Country
HR
100
UK
Service
250
UK
300
US
Admin
142
Norway
260
Germany
I was able to get the data but the Department was repeated by this query.
""" select Department, Country,count(Department) from dataset
group by Country,Department
order by Department """
How can I get the desired output?
The result set that you are producing is not really a relational result set. Why? Because rows depend on what is in the "previous" row. And in a relational database, there is no such thing as a "previous" row. This type of processing is often handled in the application layer.
Of course, SQL can do what you want. You just need to be careful:
select (case when 1 = row_number() over (partition by Department order by Country)
then Department
end) as Department,
Country, count(*) as num_people,
from dataset
group by Country,Department
order by Department, Country;
Note that the order by needs to match the window function clause to be sure that what row_number() considered to be the first row is really the first row in the result set.

match tables with intermediate mapping table (fuzzy joins with similar strings)

I'm using BigQuery.
I have two simple tables with "bad" data quality from our systems. One represents revenue and the other production rows for bus journeys.
I need to match every journey to a revenue transaction but I only have a set of fields and no key and I don't really know how to do this matching.
This is a sample of the data:
Revenue
Year, Agreement, Station_origin, Station_destination, Product
2020, 123123, London, Manchester, Qwerty
Journeys
Year, Agreement, Station_origin, Station_destination, Product
2020, 123123, Kings Cross, Piccadilly Gardens, Qwer
2020, 123123, Kings Cross, Victoria Station, Qwert
2020, 123123, London, Manchester, Qwerty
Every station has a maximum of 9 alternative names and these are stored in a "station" table.
Stations
Station Name, Station Name 2, Station Name 3,...
London, Kings Cross, Euston,...
Manchester, Piccadilly Gardens, Victoria Station,...
I would like to test matching or joining the tables first with the original fields. This will generate some matches but there are many journeys that are not matched. For the unmatched revenue rows, I would like to change the product name (shorten it to two letters and possibly get many matches from production table) and then station names by first change the station_origin and then station_destination. When using a shorter product name I could possibly get many matches but I want the row from the production table with the most common product.
Something like this:
1. Do a direct match. That is, I can use the fields as they are in the tables.
2. Do a match where the revenue.product is changed by shortening it to two letters. substr(product,0,2)
3. Change the rev.station_origin to the first alternative, Station Name 2, and then try a join. The product or other station are not changed.
4. Change the rev.station_origin to the first alternative, Station Name 2, and then try a join. The product is changed as above with a substr(product,0,2) but rev.station_destination is not changed.
5. Change the rev.station_destination to the first alternative, Station Name 2, and then try a join. The product or other station are not changed.
I was told that maybe I should create an intermediate table with all combinations of stations and products and let a rank column decide the order. The station names in the station's table are in order of importance so "station name" is more important than "station name 2" and so on.
I started to do a query with a subquery per rank and do a UNION ALL but there are so many combinations that there must be another way to do this.
Don't know if this makes any sense but I would appreciate any help or ideas to do this in a better way.
Cheers,
Cris
To implement a complex joining strategy with approximate matching, it might make more sense to define the strategy within JavaScript - and call the function from a BigQuery SQL query.
For example, the following query does the following steps:
Take the top 200 male names in the US.
Find if one of the top 200 female names matches.
If not, look for the most similar female name within the options.
Note that the logic to choose the closest option is encapsulated within the JS UDF fhoffa.x.fuzzy_extract_one(). See https://medium.com/#hoffa/new-in-bigquery-persistent-udfs-c9ea4100fd83 to learn more about this.
WITH data AS (
SELECT name, gender, SUM(number) c
FROM `bigquery-public-data.usa_names.usa_1910_2013`
GROUP BY 1,2
), top_men AS (
SELECT * FROM data WHERE gender='M'
ORDER BY c DESC LIMIT 200
), top_women AS (
SELECT * FROM data WHERE gender='F'
ORDER BY c DESC LIMIT 200
)
SELECT name male_name,
COALESCE(
(SELECT name FROM top_women WHERE name=a.name)
, fhoffa.x.fuzzy_extract_one(name, ARRAY(SELECT name FROM top_women))
) female_version
FROM top_men a

SQL Query Totals

I'm trying to write a query that calculates the average profit per employee for several projects.
I have a table that has employee names, what project they are working on, and how much profit they bring to their specific project each day.
My first query gives 3 fields - The project name, the sum of all the profits the employees bring to the project, and the number of employees in the project.
My second query I am trying to display 2 fields - the project name and the average profit per employee that each project makes
SELECT SAYSquery.ProjectName, SUM(SAYSquery.Profit) AS SumOfProfit, Count(SAYSquery.[EmpFirstName]) AS NumberOfEmps
FROM SAYSquery
WHERE profit > 0
GROUP BY SAYSquery.ProjectName;
SELECT SAYSqueryNIPE.[ProjectName], SAYSqueryNIPE.[SumOfProfit]/[NumberOfEmps] AS total
FROM SAYSqueryNIPE
GROUP BY SAYSqueryNIPE.[ProjectName], SAYSqueryNIPE.[SumOfProfit]/[NumberOfEmps];
Unfortunately, my second query is giving me the same average profit for every project and I'm not sure why. Any help would be much appreciated.
EDIT:
Query 1 reads:
**Employee Name | Sell Rate | Renumeration | Profit (Sell-Renumeration) | Project Name**
Query 2 reads:
**PROJECT NAME | SumofProfit | NumberofEmployees**
Project X | $1500 | 3 employees
Query 3 reads:
**PROJECT NAME | TOTAL**
Project X | $500 (Average profit per employee)
The problem is in your first query where you count the number of employees. As written in the question, you are returning a count of rows, not employees who worked on the project. You need to use count distinct. I'd also recommend not counting on EmpFirstName. If you have more than one employee with the same first name, the query won't give you correct results. It would be better to use a unique employee identifier instead of their first name.
SELECT SAYSquery.ProjectName, SUM(SAYSquery.Profit) AS SumOfProfit,
Count(distinct SAYSquery.[EmpFirstName]) AS NumberOfEmps
FROM SAYSquery
WHERE profit > 0
GROUP BY SAYSquery.ProjectName
You could wrap the whole thing up into a single query instead of two or three, as described in your question.
SELECT SAYSquery.ProjectName, SUM(SAYSquery.Profit)/Count(distinct SAYSquery.[EmpFirstName]) AS AvgProfitPerEmployee
FROM SAYSquery
WHERE profit > 0
GROUP BY SAYSquery.ProjectName

oracle - sql query select max from each base

I'm trying to solve this query where i need to find the the top balance at each base. Balance is in one table and bases are in another table.
This is the existing query i have that returns all the results but i need to find a way to limit it to 1 top result per baseID.
SELECT o.names.name t.accounts.bidd.baseID, MAX(t.accounts.balance)
FROM order o, table(c.accounts) t
WHERE t.accounts.acctype = 'verified'
GROUP BY o.names.name, t.accounts.bidd.baseID;
accounts is a nested table.
this is the output
Name accounts.BIDD.baseID MAX(T.accounts.BALANCE)
--------------- ------------------------- ---------------------------
Jerard 010 1251.21
john 012 3122.2
susan 012 3022.2
fin 012 3022.2
dan 010 1751.21
What i want the result to display is calculate the highest balance for each baseID and only display one record for that baseID.
So the output would look only display john for baseID 012 because he has the highest.
Any pointers in the right direction would be fantastic.
I think the problem is cause of the "Name" column. since you have three names mapped to one base id(12), it is considering all three records as unique ones and grouping them individually and not together.
Try to ignore the "Name" column in select query and in the "Group-by" clause.
SELECT t.accounts.bidd.baseID, MAX(t.accounts.balance)
FROM order o, table(c.accounts) t
WHERE t.accounts.acctype = 'verified'
GROUP BY t.accounts.bidd.baseID;

crystal reports group and show only 1 record but sum other records

I have a grouped column that is a string . The column keeps repeating. I only want it to be displayed once per group and sum the remaining columns based on .
Database table structure:
-personame-id-salary-
I want to group by personname (display once) and sum salary.
The output in cystal right now
jon 10,000
jon 10,000
bob 50,000
bob 50,000
greg 10,000
greg 10,000
It should be:
jon 10,000
bob 50,000
greg 10,000
i am only grouping by personame.
Here is my group selection code: (None fix the above problem:)
groupName #1({table.personname}) = NthLargest(1,{table.personname});
OR
{table.personname} = NthLargest(1,{table.personname},{table.id});
OR
{table.personname} = Minimum({table.personname});
It looks as though you are currently displaying id and salary in the detail section (or possibly the group header and group footer).
You need to suppress the detail section, and display the id and the sum of the salary in the group footer.
If you need to add the id and/or the sum of the salary to the group footer, simply drag and drop them from where they are currently displayed in the report.
To suppress the detail section, right-click in the grey area to the left of the page layout in the design tab for the Details section and select Suppress (No Drill-Down). (You should also do the same for the group header section.)