updating nulls based on column - sql

So I got this very inconsistent record for example(just an example):
Manager | Associate | FTE | Revenue
Bob | James | Y | 500
Bob | James | NULL | 100
Bob | James | Y | 200
Kelly | Rick | N | 200
Kelly | Rick | N | 500
Kelly | Rick | NULL | 300
So the goal i wanted was to Sum up the revenue, but the problem is in the group by the nulls kinda split them apart. So i want to write an update statement saying basically "well Looks like James and Bob are both FTE, so lets update that to Y and Kelly and rick are not so update that to no."
How can i fix this? Using MSAccess and of course my table is a lot biger with a lot of different name combos.

You can "impute" the value by using an aggregation function. The following query aggregates by manager/associate and takes the maximum value of fte. This is then joined back to the original data to do the calculation:
select ma.fte, sum(Revenue)
from table as t inner join
(select manager, associate, max(fte) as fte
from table as t
group by manager, associate
) as ma
on t.manager = ma.manager and
t.associate = ma.associate
group by ma.fte;
EDIT:
Immediately after posting this, I realized the join is not necessary. Two aggregations are sufficient:
select ma.fte, sum(Revenue)
from (select manager, associate, max(fte) as fte, sum(Revenue) as Revenue
from table as t
group by manager, associate
) as ma
group by ma.fte;

You haven't given the primary key columns, which makes it a bit harder. I've called it {id} below.
With the nulls, many SQL dialects have an "IfNull" function, but it seems MS-Access does not. You can get the same effect this way:
IIF(ISNULL(column),0,column)
You'd use that in a SELECT as so:
SELECT IIF(ISNULL(Revenue),0,Revenue) FROM ...
For a one-off fix you could do this:
UPDATE {table} SET Revenue=0 WHERE Revenue = NULL;
Doing a join to get the FTE from another row is more complex, and I don't have access handy to see just what the limits and syntax are. The easy to understand way is a nested query:
UPDATE {table} a SET FTE = (SELECT max(FTE) FROM {table} b WHERE FTE IS NOT NULL AND a.{id} = b.{id})
The max() function works here because it ignores nulls, where some other functions return null if you pass a null in.

Related

PostgreSQL Count DISTINCT from one column when grouped by another

I have a single table that looks like the following (dumbed down):
userid | action | userstate
-----------------------------------------------------
1 | click | Maryland
2 | press | Delaware
3 | jog | New York
3 | leap | New York
What I'm trying to query is "number of users doing ANY action, per state"
So the result would be:
state | users_acting
---------------------
Maryland | 1
Delaware | 1
New York | 1
Note that individual users will only be part in one state.
I can't get the mix of distinct users correct with grouping by state. I can't
SELECT DISTINCT (userid), COUNT(userid) FROM data GROUP BY state
because the distinct column needs to be in the group by, which I don't want to actually do, not to mention problems w/ the select clause.
Thanks for any thoughts.
Just found out that there's a COUNT(DISTINCT( option which doesn't require that distinct value to be placed in the grouping clause.
SELECT COUNT(DISTINCT userid) FROM data GROUP BY state
Does the trick
You can try out the below format
SELECT COUNT(DISTINCT userid) FROM data GROUP BY state

Novice seeking help, Max Aggregate not returning expected results

I'm still very new to MS-SQL. I have a simple table and query that that is getting the best of me. I know it will something fundamental I'm overlooking.
I've changed the field names but the idea is the same.
So the idea is that every time someone signs up they get a RegID, Name, and Team. The names are unique, so for below yes John changed teams. And that's my trouble.
Football Table
+------------+----------+---------+
| Max_RegID | Name | Team |
+------------+----------+---------+
| 100 | John | Red |
| 101 | Bill | Blue |
| 102 | Tom | Green |
| 103 | John | Green |
+------------+----------+---------+
With the query at the bottom using the Max_RegID, I was expecting to get back only one record.
+------------+----------+---------+
| Max_RegID | Name | Team |
+------------+----------+---------+
| 103 | John | Green |
+------------+----------+---------+
Instead I get back below, Which seems to include Max_RegID but also for each team. What am I doing wrong?
+------------+----------+---------+
| Max_RegID | Name | Team |
+------------+----------+---------+
| 100 | John | Red |
| 103 | John | Green |
+------------+----------+---------+
My Query
SELECT
Max(Football.RegID) AS Max_RegID,
Football.Name,
Football.Team
FROM
Football
GROUP BY
Football.RegID,
Football.Name,
Football.Team
EDIT* Removed the WHERE statement
The reason you're getting the results that you are is because of the way you have your GROUP BY clause structured.
When you're using any aggregate function, MAX(X), SUM(X), COUNT(X), or what have you, you're telling the SQL engine that you want the aggregate value of column X for each unique combination of the columns listed in the GROUP BY clause.
In your query as written, you're grouping by all three of the columns in the table, telling the SQL engine that each tuple is unique. Therefore the query is returning ALL of the values, and you aren't actually getting the MAX of anything at all.
What you actually want in your results is the maximum RegID for each distinct value in the Name column and also the Team that goes along with that (RegID,Name) combination.
To accomplish that you need to find the MAX(ID) for each Name in an initial data set, and then use that list of RegIDs to add the values for Name and Team in a secondary data set.
Caveat (per comments from #HABO): This is premised on the assumption that RegID is a unique number (an IDENTITY column, value from a SEQUENCE, or something of that sort). If there are duplicate values, this will fail.
The most straight forward way to accomplish that is with a sub-query. The sub-query below gets your unique RegIDs, then joins to the original table to add the other values.
SELECT
f.RegID
,f.Name
,f.Team
FROM
Football AS f
JOIN
(--The sub-query, sq, gets the list of IDs
SELECT
MAX(f2.RegID) AS Max_RegID
FROM
Football AS f2
GROUP BY
f2.Name
) AS sq
ON
sq.Max_RegID = f.RegID;
EDIT: Sorry. I just re-read the question. To get just the single record for the MAX(RegID), just take the GROUP BY out of the sub-query, and you'll just get the current maximum value, which you can use to find the values in the rest of the columns.
SELECT
f.RegID
,f.Name
,f.Team
FROM
Football AS f
JOIN
(--The sub-query, sq, now gets the MAX ID
SELECT
MAX(f2.RegID) AS Max_RegID
FROM
Football AS f2
) AS sq
ON
sq.Max_RegID = f.RegID;
Use row_number()
select * from
(SELECT
Football.RegID AS Max_RegID,
Football.Name,
Football.Team, row_number() over(partition by name order by Football.RegID desc) as rn
FROM
Football
WHERE
Football.Name = 'John')a
where rn=1
simply you can edit your query below way
SELECT *
FROM
Football f
WHERE
f.Name = 'John' and
Max_RegID = (SELECT Max(Football.Max_RegID) where Football.Name = 'John'
)
or
if sql server simply use this
select top 1 * from Football f
where f.Name = 'John'
order by Max_RegID desc
or
if mysql then
select * from Football f
where f.Name = 'John'
order by Max_RegID desc
Limit 1
You need self join :
select f1.*
from Football f inner join
Football f1
on f1.name = f.name
where f.Max_RegID = 103;
After re-visit question, the sample data suggests me subquery :
select f.*
from Football f
where name = (select top (1) f1.name
from Football f1
order by f1.Max_RegID desc
);

Adding another column based on different criteria (SQL-server)

I do quite a bit of data analysis and use SQL on a daily basis but my queries are rather simple, usually pulling a lot of data which I thereafter manipulate in excel, where I'm a lot more experienced.
This time though I'm trying to generate some Live Charts which have as input a single SQL query. I will now have to create complex tables without the aid of the excel tools I'm so familiar with.
The problem is the following:
We have telesales agents that book appointments by answering to inbound calls and making outbound cals. These will generate leads that might potentially result in a sale. The relevant tables and fields for this problem are these:
Contact Table
Agent
Sales Table
Price
OutboundCallDate
I want to know for each telesales agent their respective Total Sales amount in one column, and their outbound sales value in another.
The end result should look something like this:
+-------+------------+---------------+
| Agent | TotalSales | OutboundSales |
+-------+------------+---------------+
| Tom | 30145 | 0 |
| Sally | 16449 | 1000 |
| John | 10500 | 300 |
| Joe | 50710 | 0 |
+-------+------------+---------------+
With the below SQL I get the following result:
SELECT contact.agent, SUM(sales.price)
FROM contact, sales
WHERE contact.id = sales.id
GROUP BY contact.agent
+-------+------------+
| Agent | TotalSales |
+-------+------------+
| Tom | 30145 |
| Sally | 16449 |
| John | 10500 |
| Joe | 50710 |
+-------+------------+
I want to add the third column to this query result, in which the price is summed only for records where the OutboundCallDate field contains data. Something a bit like (where sales.OutboundCallDate is Not Null)
I hope this is clear enough. Let me know if that's not the case.
Use CASE
SELECT c.Agent,
SUM(s.price) AS TotalSales,
SUM(CASE
WHEN s.OutboundCallDate IS NOT NULL THEN s.price
ELSE 0
END) AS OutboundSales
FROM contact c, sales s
WHERE c.id = s.id
GROUP BY c.agent
I think the code would look
SELECT contact.agent, SUM(sales.price)
FROM contact, sales
WHERE contact.id = sales.id AND SUM(WHERE sales.OutboundCallDate)
GROUP BY contact.agent
notI'm assuming your Sales table contains something like Units and Price. If it's just a sales amount, then replace the calculation with the sales amount field name.
The key thing here is that the value summed should only be the sales amount if the OutboundCallDate exists. If the OutboundCallDate is not NULL, then we're using a value of 0 for that row.
select Agent.Agent, TotalSales = sum (sales.Price*Units)
, OutboundSales = sum (
case when Outboundcalldate is not null then price*Units
else 0
end)
From Sales inner join Agent on Sales.Agent = Agent.Agent
Group by Agent.Agent

Showing data in group by

I am running a query which get data from multiple tables and condition with inner join. I want this query to group by a single column but when i do it i get: ORA-00979: not a GROUP BY expression, error message. Well as per my understanding this is because of other table column which not support this group by.
This query I am writing to generate reports from iReport. for example below column I am getting from three different tables details, food and hobbies, I want to combine this result group by name...
Name | food | hobby
-------------------------
peter | chips | traveling
peter | burger | tennis
peter | burger | writing
Dave | lamb | game
Dave | kebab | reading
fine result that i want will be: here I only want to get name once and respective all values (even when it is duplicate) and other duplicate name rows should not contains any data..please help me with this sql query.. if there's any option in iReport to do this please let me know or any other keyword/inner queries in sql, i tried there group by option while you design table in it.. but it is not working... thanks in advance
Name | food | hobby
--------------------------------------------------------------
peter | chips | traveling
------ | burger | tennis
------ | burger | writing
Dave | lamb | game
-------| kebab | reading
Query for it:
SELECT org.Location AS organisation_location, list.listId as list_listid, org.Centre AS org_Centre,
org.Department AS org_Department, org.Position AS org_Position, q.content AS q_content,
q.dueTime AS q_dueTime, a.submitted_date AS a_submitted_date, list.frequency AS list_frequency,
a.comments AS a_comments, a.userid AS a_userid, a.submitted as a_submitted
FROM org INNER JOIN list ON org.id = list.org_id INNER JOIN q ON klist.id = q.list_id INNER JOIN a ON qid = a.q_id
WHERE a.submitted=0 andlist.listid='xyz'
I want to group the same by list.listid
Your query doesn't contain "Name", "Food" or "Hobby" so I'm little confused, but following query should help you create your own to achieve desired goal.
SELECT
CASE WHEN X.VERIFY_COL = 1 THEN X.YOUR_UNIQUE_COL ELSE NULL END AS YOUR_COL_NAME,
* FROM
(SELECT
ROW_NUMBER() OVER (PARTITION BY YOUR_UNIQUE_COL ORDER BY YOUR_UNIQUE_COL) AS VERIFY_COL,
* FROM YOUR_VIEW
) X
You can partition your data by column you would like to have only once in your query YOUR_UNIQUE_COL. Then easy take advantage of ROW_COUNT() to set NULL for all rows' names with ROW_COUNT() > 1.
Please note it's SQL SERVER solution. What database engine do you use?
I don't think you need to group your data, try deactivating "Print repeated values"

SQL duration between dates for different persons

hopefully someone can help me with the following task:
I hVE got 2 tables Treatment and 'Person'. Treatment contains the dates when treatments for the different persons were started, Person contains personal information, e.g. lastname.
Now I have to find all persons where the duration between the first and last treatment is over 20 years.
The Tables look something like this:
Person
| PK_Person | First name | Name |
_________________________________
| 1 | A_Test | Karl |
| 2 | B_Test | Marie |
| 3 | C_Test | Steve |
| 4 | D_Test | Jack |
Treatment
| PK_Treatment | Description | Starting time | PK_Person |
_________________________________________________________
| 1 | A | 01.01.1989 | 1
| 2 | B | 02.11.2001 | 1
| 3 | A | 05.01.2004 | 1
| 4 | C | 01.09.2013 | 1
| 5 | B | 01.01.1999 | 2
So in this example, the output should be person Karl, A_Test.
Hopefully its understandable what the problem is and someone can help me.
Edit: There seems to be a problem with the formatting, the tables are not displayed correctly, I hope its readable.
SELECT *
FROM person p
INNER JOIN Treatment t on t.PK_Person = p.PK_Person
WHERE DATEDIFF(year,[TREATMENT_DATE_1], [TREATMENT_DATE_2]) > 20
This should do it, it is however untested so will need tweaking to your schema
Your data looks a bit suspicious, because the first name doesn't look like a first name.
But, what you want to do is aggregate the Treatment table for each person and get the minimum and maximum starting times. When the difference is greater than 20 years, then keep the person, and join back to the person table to get the names.
select p.FirstName, p.LastName
from Person p join
(select pk_person, MIN(StartingTime) as minst, MAX(StartingTime) as maxst
from Treatment t
group by pk_person
having MAX(StartingTime) - MIN(StartingTime) > 20*365.25
) t
on p.pk_person = t.pk_person;
Note that date arithmetic does vary between databases. In most databases, taking the difference of two dates counts the number of days between them, so this is a pretty general approach (although not guaranteed to work on all databases).
I've taken a slightly different approach and worked with SQL Fiddle to verify that the below statements work.
As mentioned previously, the data does seem a bit suspicious; nonetheless per your requirements, you would be able to do the following:
select P.PK_Person, p.FirstName, p.Name
from person P
inner join treatment T on T.pk_person = P.pk_person
where DATEDIFF((select x.startingtime from treatment x where x.pk_person = p.pk_person order by startingtime desc limit 1), T.StartingTime) > 7305
First, we need to inner join treatements which will ignore any persons who are not in the treatment table. The where portion now just needs to select based on your criteria (in this case a difference of dates). Doing a subquery will generate the last date a person has been treated, compare that to each of your records, and filter by number of days (7305 = 20 years * 365.25).
Here is the working SQL Fiddle sample.