How to count states using case statement - sql

I am converting a column of state abbreviations to state names but I'd also like to get a count of each state name. When I try adding SELECT state,count(*) as count to the beginning of my query, I end up getting errors. Where can I add a count function to get an output with state names and count of each?
Code I'm using
query = """
SELECT
CASE
WHEN state = 'AL' THEN 'Alabama'
WHEN state = 'AK' THEN 'Alaska'
WHEN state = 'AZ' THEN 'Arizona'
WHEN state = 'AR' THEN 'Arkansas'
WHEN state = 'CA' THEN 'California'
WHEN state = 'CO' THEN 'Colorado'
WHEN state = 'CT' THEN 'Connecticut'
WHEN state = 'DE' THEN 'Delaware'
WHEN state = 'DC' THEN 'District of Columbia'
WHEN state = 'FL' THEN 'Florida'
WHEN state = 'GA' THEN 'Georgia'
WHEN state = 'HI' THEN 'Hawaii'
WHEN state = 'ID' THEN 'Idaho'
WHEN state = 'IL' THEN 'Illinois'
WHEN state = 'IN' THEN 'Indiana'
WHEN state = 'IA' THEN 'Iowa'
WHEN state = 'KS' THEN 'Kansas'
WHEN state = 'KY' THEN 'Kentucky'
WHEN state = 'LA' THEN 'Louisiana'
WHEN state = 'ME' THEN 'Maine'
WHEN state = 'MD' THEN 'Maryland'
WHEN state = 'MA' THEN 'Massachusetts'
WHEN state = 'MI' THEN 'Michigan'
WHEN state = 'MN' THEN 'Minnesota'
WHEN state = 'MS' THEN 'Mississippi'
WHEN state = 'MO' THEN 'Missouri'
WHEN state = 'MT' THEN 'Montana'
WHEN state = 'NE' THEN 'Nebraska'
WHEN state = 'NV' THEN 'Nevada'
WHEN state = 'NH' THEN 'New Hampshire'
WHEN state = 'NJ' THEN 'New Jersey'
WHEN state = 'NM' THEN 'New Mexico'
WHEN state = 'NY' THEN 'New York'
WHEN state = 'NC' THEN 'North Carolina'
WHEN state = 'ND' THEN 'North Dakota'
WHEN state = 'OH' THEN 'Ohio'
WHEN state = 'OK' THEN 'Oklahoma'
WHEN state = 'OR' THEN 'Oregon'
WHEN state = 'PA' THEN 'Pennsylvania'
WHEN state = 'RI' THEN 'Rhode Island'
WHEN state = 'SC' THEN 'South Carolina'
WHEN state = 'SD' THEN 'South Dakota'
WHEN state = 'TN' THEN 'Tennessee'
WHEN state = 'TX' THEN 'Texas'
WHEN state = 'UT' THEN 'Utah'
WHEN state = 'VT' THEN 'Vermont'
WHEN state = 'VA' THEN 'Virginia'
WHEN state = 'WA' THEN 'Washington'
WHEN state = 'WV' THEN 'West Virginia'
WHEN state = 'WI' THEN 'Wisconsin'
WHEN state = 'WY' THEN 'Wyoming'
WHEN state = 'AB' THEN 'Alberta'
WHEN state = 'BC' THEN 'British Columbia'
WHEN state = 'MB' THEN 'Manitoba'
WHEN state = 'NM' THEN 'New Brunswick'
WHEN state = 'NL' THEN 'Newfoundland and Labrador'
WHEN state = 'NT' THEN 'Northwest Territories'
WHEN state = 'NS' THEN 'Nova Scotia'
WHEN state = 'NU' THEN 'Nunavut'
WHEN state = 'ON' THEN 'Ontario'
WHEN state = 'PE' THEN 'Prince Edward Island'
WHEN state = 'QC' THEN 'Quebec'
WHEN state = 'SK' THEN 'Saskatchewan'
WHEN state = 'YT' THEN 'Yukon Territory'
END AS state
FROM business
"""
result = spark.sql(query)
result.show()
This is what I end up with:
But I'd like it to look like this, just with full state names instead of abbreviations:

Here you go:
SELECT CASE WHEN state = 'AL' THEN 'Alabama' WHEN state = 'AK' THEN 'Alaska' WHEN state = 'AZ' THEN 'Arizona' WHEN state = 'AR' THEN 'Arkansas' WHEN state = 'CA' THEN 'California' WHEN state = 'CO' THEN 'Colorado' WHEN state = 'CT' THEN 'Connecticut' WHEN state = 'DE' THEN 'Delaware' WHEN state = 'DC' THEN 'District of Columbia' WHEN state = 'FL' THEN 'Florida' WHEN state = 'GA' THEN 'Georgia' WHEN state = 'HI' THEN 'Hawaii' WHEN state = 'ID' THEN 'Idaho' WHEN state = 'IL' THEN 'Illinois' WHEN state = 'IN' THEN 'Indiana' WHEN state = 'IA' THEN 'Iowa' WHEN state = 'KS' THEN 'Kansas' WHEN state = 'KY' THEN 'Kentucky' WHEN state = 'LA' THEN 'Louisiana' WHEN state = 'ME' THEN 'Maine' WHEN state = 'MD' THEN 'Maryland' WHEN state = 'MA' THEN 'Massachusetts' WHEN state = 'MI' THEN 'Michigan' WHEN state = 'MN' THEN 'Minnesota' WHEN state = 'MS' THEN 'Mississippi' WHEN state = 'MO' THEN 'Missouri' WHEN state = 'MT' THEN 'Montana' WHEN state = 'NE' THEN 'Nebraska' WHEN state = 'NV' THEN 'Nevada' WHEN state = 'NH' THEN 'New Hampshire' WHEN state = 'NJ' THEN 'New Jersey' WHEN state = 'NM' THEN 'New Mexico' WHEN state = 'NY' THEN 'New York' WHEN state = 'NC' THEN 'North Carolina' WHEN state = 'ND' THEN 'North Dakota' WHEN state = 'OH' THEN 'Ohio' WHEN state = 'OK' THEN 'Oklahoma' WHEN state = 'OR' THEN 'Oregon' WHEN state = 'PA' THEN 'Pennsylvania' WHEN state = 'RI' THEN 'Rhode Island' WHEN state = 'SC' THEN 'South Carolina' WHEN state = 'SD' THEN 'South Dakota' WHEN state = 'TN' THEN 'Tennessee' WHEN state = 'TX' THEN 'Texas' WHEN state = 'UT' THEN 'Utah' WHEN state = 'VT' THEN 'Vermont' WHEN state = 'VA' THEN 'Virginia' WHEN state = 'WA' THEN 'Washington' WHEN state = 'WV' THEN 'West Virginia' WHEN state = 'WI' THEN 'Wisconsin' WHEN state = 'WY' THEN 'Wyoming' WHEN state = 'AB' THEN 'Alberta' WHEN state = 'BC' THEN 'British Columbia' WHEN state = 'MB' THEN 'Manitoba' WHEN state = 'NM' THEN 'New Brunswick' WHEN state = 'NL' THEN 'Newfoundland and Labrador' WHEN state = 'NT' THEN 'Northwest Territories' WHEN state = 'NS' THEN 'Nova Scotia' WHEN state = 'NU' THEN 'Nunavut' WHEN state = 'ON' THEN 'Ontario' WHEN state = 'PE' THEN 'Prince Edward Island' WHEN state = 'QC' THEN 'Quebec' WHEN state = 'SK' THEN 'Saskatchewan' WHEN state = 'YT' THEN 'Yukon Territory' END AS state,
COUNT(*)
FROM business
GROUP BY state

Related

Population by United States Region

Using SQL Let’s say we have a dataset of population by state (e.g., Vermont, 623,251, and so on), but we want to know the population by United States region (e.g., Midwest, 68,985,454). Could you describe how you would go about doing that?
Dataset from census.gov
Where I'm stuck at
--First I created a table with a state and population column.
CREATE TABLE states (
state VARCHAR(20),
population INT
);
--Then I uploaded a CSV file from census.gov that I cleaned up.
SELECT * FROM states;
--Created a temporary table to add in the region column.
DROP TABLE IF EXISTS temp_Regions;
CREATE TEMP TABLE temp_Regions (
state VARCHAR(20),
state_pop INT,
region VARCHAR(20),
region_pop INT
);
INSERT INTO temp_Regions
SELECT state, population
FROM states;
--Used CASE WHEN statements to put states in to their respective regions.
SELECT state,
CASE WHEN state IN ('Connecticut', 'Maine', 'Massachusetts', 'New Hampshire', 'Rhode Island', 'Vermont', 'New Jersey', 'New York', 'Pennsylvania') THEN 'Northeast'
WHEN state IN ('Illinois', 'Indiana', 'Michigan', 'Ohio', 'Wisconsin', 'Iowa', 'Kansas', 'Minnesota', 'Missouri', 'Nebraska', 'North Dakota', 'South Dakota') THEN 'Midwest'
WHEN state IN ('Delaware', 'District of Columbia', 'Florida', 'Georgia', 'Maryland', 'North Carolina', 'South Carolina', 'Virginia', 'West Virginia', 'Alabama', 'Kentucky', 'Mississippi', 'Tennessee', 'Arkansas', 'Louisiana', 'Oklahoma', 'Texas') THEN 'South'
WHEN state IN ('Arizona', 'Colorado', 'Idaho', 'Montana', 'Nevada', 'New Mexico', 'Utah', 'Wyoming', 'Alaska', 'California', 'Hawaii', 'Oregon', 'Washington') THEN 'West'
END AS region, state_pop, region_pop
FROM temp_Regions;
--Now I'm stuck at this point. I'm unable to get data into the region_pop column. How do I get the sum of the populations by U.S. Region?
Let me know if you need further clarification on things. Thanks for your help y'all!
You can make use of analytical function sum() over(partition by) to achieve this
with data
as (
SELECT state
,CASE WHEN state IN ('Connecticut', 'Maine', 'Massachusetts', 'New Hampshire', 'Rhode Island', 'Vermont', 'New Jersey', 'New York', 'Pennsylvania') THEN 'Northeast'
WHEN state IN ('Illinois', 'Indiana', 'Michigan', 'Ohio', 'Wisconsin', 'Iowa', 'Kansas', 'Minnesota', 'Missouri', 'Nebraska', 'North Dakota', 'South Dakota') THEN 'Midwest'
WHEN state IN ('Delaware', 'District of Columbia', 'Florida', 'Georgia', 'Maryland', 'North Carolina', 'South Carolina', 'Virginia', 'West Virginia', 'Alabama', 'Kentucky', 'Mississippi', 'Tennessee', 'Arkansas', 'Louisiana', 'Oklahoma', 'Texas') THEN 'South'
WHEN state IN ('Arizona', 'Colorado', 'Idaho', 'Montana', 'Nevada', 'New Mexico', 'Utah', 'Wyoming', 'Alaska', 'California', 'Hawaii', 'Oregon', 'Washington') THEN 'West'
END AS region
, state_pop
FROM temp_Regions
)
select state
,region
,state_pop
,sum(state_pop) over(partition by region) as region_population
from data

SQL Select Convert State Name To Abbreviation

In a SQL select statement, how to convert a full state name to state abbreviation (e.g. New York to NY)? I'd like to do this without joins if possible. What would the regexp_replace look like?
select regexp_replace(table.state, 'New York', 'NY', 'g') as state
Can this approach be done en mass for all states?
For reference list of states names and abbreviations: https://gist.github.com/esfand/9443427.
Here it is in raw WHEN/THEN form if needed, along with Canadian provinces:
CASE "YOUR COLUMN CONTAINING FULL STATE NAMES"
WHEN 'Alabama' THEN 'AL'
WHEN 'Alaska' THEN 'AK'
WHEN 'Arizona' THEN 'AZ'
WHEN 'Arkansas' THEN 'AR'
WHEN 'California' THEN 'CA'
WHEN 'Colorado' THEN 'CO'
WHEN 'Connecticut' THEN 'CT'
WHEN 'Delaware' THEN 'DE'
WHEN 'District of Columbia' THEN 'DC'
WHEN 'Florida' THEN 'FL'
WHEN 'Georgia' THEN 'GA'
WHEN 'Hawaii' THEN 'HI'
WHEN 'Idaho' THEN 'ID'
WHEN 'Illinois' THEN 'IL'
WHEN 'Indiana' THEN 'IN'
WHEN 'Iowa' THEN 'IA'
WHEN 'Kansas' THEN 'KS'
WHEN 'Kentucky' THEN 'KY'
WHEN 'Louisiana' THEN 'LA'
WHEN 'Maine' THEN 'ME'
WHEN 'Maryland' THEN 'MD'
WHEN 'Massachusetts' THEN 'MA'
WHEN 'Michigan' THEN 'MI'
WHEN 'Minnesota' THEN 'MN'
WHEN 'Mississippi' THEN 'MS'
WHEN 'Missouri' THEN 'MO'
WHEN 'Montana' THEN 'MT'
WHEN 'Nebraska' THEN 'NE'
WHEN 'Nevada' THEN 'NV'
WHEN 'New Hampshire' THEN 'NH'
WHEN 'New Jersey' THEN 'NJ'
WHEN 'New Mexico' THEN 'NM'
WHEN 'New York' THEN 'NY'
WHEN 'North Carolina' THEN 'NC'
WHEN 'North Dakota' THEN 'ND'
WHEN 'Ohio' THEN 'OH'
WHEN 'Oklahoma' THEN 'OK'
WHEN 'Oregon' THEN 'OR'
WHEN 'Pennsylvania' THEN 'PA'
WHEN 'Rhode Island' THEN 'RI'
WHEN 'South Carolina' THEN 'SC'
WHEN 'South Dakota' THEN 'SD'
WHEN 'Tennessee' THEN 'TN'
WHEN 'Texas' THEN 'TX'
WHEN 'Utah' THEN 'UT'
WHEN 'Vermont' THEN 'VT'
WHEN 'Virginia' THEN 'VA'
WHEN 'Washington' THEN 'WA'
WHEN 'West Virginia' THEN 'WV'
WHEN 'Wisconsin' THEN 'WI'
WHEN 'Wyoming' THEN 'WY'
WHEN 'Alberta' THEN 'AB'
WHEN 'British Columbia' THEN 'BC'
WHEN 'Manitoba' THEN 'MB'
WHEN 'New Brunswick' THEN 'NB'
WHEN 'Newfoundland and Labrador' THEN 'NL'
WHEN 'Northwest Territories' THEN 'NT'
WHEN 'Nova Scotia' THEN 'NS'
WHEN 'Nunavut' THEN 'NU'
WHEN 'Ontario' THEN 'ON'
WHEN 'Prince Edward Island' THEN 'PE'
WHEN 'Quebec' THEN 'QC'
WHEN 'Saskatchewan' THEN 'SK'
WHEN 'Yukon Territory' THEN 'YT'
ELSE NULL
END
With PostgreSQL you can use JSON
select '{"Alabama": "AL", "Alaska": "AK"}'::json->'Alabama'
You can also use a column reference instead of a string literal
select
'{"Alabama": "AL", "Alaska": "AK"}'::json->example.state
from
(values ('Alabama')) example(state)
As comments suggested that a join is needed. Below is what I ended up doing. Let me know if there is a better way.
with states(name, abbr) as (
select
*
from
(values ('Alabama', 'AL'),
('Alaska', 'AK'),
('Arizona', 'AZ'),
('Arkansas', 'AR'),
('California', 'CA'),
('Colorado', 'CO'),
('Connecticut', 'CT'),
('Delaware', 'DE'),
('District of Columbia', 'DC'),
('Florida', 'FL'),
('Georgia', 'GA'),
('Hawaii', 'HI'),
('Idaho', 'ID'),
('Illinois', 'IL'),
('Indiana', 'IN'),
('Iowa', 'IA'),
('Kansas', 'KS'),
('Kentucky', 'KY'),
('Louisiana', 'LA'),
('Maine', 'ME'),
('Maryland', 'MD'),
('Massachusetts', 'MA'),
('Michigan', 'MI'),
('Minnesota', 'MN'),
('Mississippi', 'MS'),
('Missouri', 'MO'),
('Montana', 'MT'),
('Nebraska', 'NE'),
('Nevada', 'NV'),
('New Hampshire', 'NH'),
('New Jersey', 'NJ'),
('New Mexico', 'NM'),
('New York', 'NY'),
('North Carolina', 'NC'),
('North Dakota', 'ND'),
('Ohio', 'OH'),
('Oklahoma', 'OK'),
('Oregon', 'OR'),
('Pennsylvania', 'PA'),
('Rhode Island', 'RI'),
('South Carolina', 'SC'),
('South Dakota', 'SD'),
('Tennessee', 'TN'),
('Texas', 'TX'),
('Utah', 'UT'),
('Vermont', 'VT'),
('Virginia', 'VA'),
('Washington', 'WA'),
('West Virginia', 'WV'),
('Wisconsin', 'WI'),
('Wyoming', 'WY')) as state
)
And select (select states.abbr from states where name = state_name)

How to use cartopy to create colored US states

I need to create a map where states have different colors depending on a piece of data about that state. I found an example of a US map in the cartopy gallery, but it didn't demonstrate how to refer to the states and access their attributes, and there little else out there:
From the example, I've simplified their code to the following, and would appreciate any help with modifying this to get the face colors of the states to be set according to the magnitude of popdensity for the state.
import matplotlib.pyplot as plt
import cartopy.crs as ccrs
import cartopy.io.shapereader as shpreader
fig = plt.figure()
ax = fig.add_axes([0, 0, 1, 1], projection=ccrs.LambertConformal())
ax.set_extent([-125, -66.5, 20, 50], ccrs.Geodetic())
shapename = 'admin_1_states_provinces_lakes_shp'
states_shp = shpreader.natural_earth(resolution='110m',
category='cultural', name=shapename)
popdensity = {
'New Jersey': 438.00,
'Rhode Island': 387.35,
'Massachusetts': 312.68,
'Connecticut': 271.40,
'Maryland': 209.23,
'New York': 155.18,
'Delaware': 154.87,
'Florida': 114.43,
'Ohio': 107.05,
'Pennsylvania': 105.80,
'Illinois': 86.27,
'California': 83.85,
'Virginia': 69.03,
'Michigan': 67.55,
'Indiana': 65.46,
'North Carolina': 63.80,
'Georgia': 54.59,
'Tennessee': 53.29,
'New Hampshire': 53.20,
'South Carolina': 51.45,
'Louisiana': 39.61,
'Kentucky': 39.28,
'Wisconsin': 38.13,
'Washington': 34.20,
'Alabama': 33.84,
'Missouri': 31.36,
'Texas': 30.75,
'West Virginia': 29.00,
'Vermont': 25.41,
'Minnesota': 23.86,
'Mississippi': 23.42,
'Iowa': 20.22,
'Arkansas': 19.82,
'Oklahoma': 19.40,
'Arizona': 17.43,
'Colorado': 16.01,
'Maine': 15.95,
'Oregon': 13.76,
'Kansas': 12.69,
'Utah': 10.50,
'Nebraska': 8.60,
'Nevada': 7.03,
'Idaho': 6.04,
'New Mexico': 5.79,
'South Dakota': 3.84,
'North Dakota': 3.59,
'Montana': 2.39,
'Wyoming': 1.96}
ax.background_patch.set_visible(False)
ax.outline_patch.set_visible(False)
ax.set_title('State Population Density')
for state in shpreader.Reader(states_shp).geometries():
### I need to replace the following code with code that sets the
### facecolor as a gradient based on the population density above
facecolor = [0.9375, 0.9375, 0.859375]
edgecolor = 'black'
ax.add_geometries([state], ccrs.PlateCarree(),
facecolor=facecolor, edgecolor=edgecolor)
plt.show()
To have access to states' attributes, you need to iterate through .records() rather than .geometries(). Here is a working code based on yours. Read comments in the code's portions that I add / modified for clarification.
import matplotlib.pyplot as plt
import cartopy.crs as ccrs
import cartopy.io.shapereader as shpreader
fig = plt.figure()
ax = fig.add_axes([0, 0, 1, 1], projection=ccrs.LambertConformal())
ax.set_extent([-125, -66.5, 20, 50], ccrs.Geodetic())
shapename = 'admin_1_states_provinces_lakes_shp'
states_shp = shpreader.natural_earth(resolution='110m',
category='cultural', name=shapename)
popdensity = {
'New Jersey': 438.00,
'Rhode Island': 387.35,
'Massachusetts': 312.68,
'Connecticut': 271.40,
'Maryland': 209.23,
'New York': 155.18,
'Delaware': 154.87,
'Florida': 114.43,
'Ohio': 107.05,
'Pennsylvania': 105.80,
'Illinois': 86.27,
'California': 83.85,
'Virginia': 69.03,
'Michigan': 67.55,
'Indiana': 65.46,
'North Carolina': 63.80,
'Georgia': 54.59,
'Tennessee': 53.29,
'New Hampshire': 53.20,
'South Carolina': 51.45,
'Louisiana': 39.61,
'Kentucky': 39.28,
'Wisconsin': 38.13,
'Washington': 34.20,
'Alabama': 33.84,
'Missouri': 31.36,
'Texas': 30.75,
'West Virginia': 29.00,
'Vermont': 25.41,
'Minnesota': 23.86,
'Mississippi': 23.42,
'Iowa': 20.22,
'Arkansas': 19.82,
'Oklahoma': 19.40,
'Arizona': 17.43,
'Colorado': 16.01,
'Maine': 15.95,
'Oregon': 13.76,
'Kansas': 12.69,
'Utah': 10.50,
'Nebraska': 8.60,
'Nevada': 7.03,
'Idaho': 6.04,
'New Mexico': 5.79,
'South Dakota': 3.84,
'North Dakota': 3.59,
'Montana': 2.39,
'Wyoming': 1.96}
ax.background_patch.set_visible(False)
ax.outline_patch.set_visible(False)
ax.set_title('State Population Density')
#for state in shpreader.Reader(states_shp).geometries():
for astate in shpreader.Reader(states_shp).records():
### You want to replace the following code with code that sets the
### facecolor as a gradient based on the population density above
#facecolor = [0.9375, 0.9375, 0.859375]
edgecolor = 'black'
try:
# use the name of this state to get pop_density
state_dens = popdensity[ astate.attributes['name'] ]
except:
state_dens = 0
# simple scheme to assign color to each state
if state_dens < 40:
facecolor = "lightyellow"
elif state_dens > 200:
facecolor = "red"
else:
facecolor = "pink"
# `astate.geometry` is the polygon to plot
ax.add_geometries([astate.geometry], ccrs.PlateCarree(),
facecolor=facecolor, edgecolor=edgecolor)
plt.show()
The resulting plot:

How do you only return one field when there are multiple entries for each field?

I am trying to only return one email address for each employee. An Employee can be both an employee and a student. If you have both an employee and student email address then I only want to return the employee email address else if you only have student email address then return the student email address.
Here is the entire query:
select --spriden_pidm as pidm,
spriden_id as ban_id,
spriden_last_name as lastname,
spriden_first_name as firstname,
gmal.email,
phone_number.area || phone_number.phone as phone_number,
addr.permanent_address AS street,
addr.permanent_city AS city,
addr.permanent_state AS state,
addr.permanent_zip AS zip,
case
when nbrjobs_ecls_code in ('E1', 'E2', 'EN', 'F1', 'F2') and nbrjobs_ann_salary between 0 and 49999.99 then 'EHRA1'
when nbrjobs_ecls_code in ('E1', 'E2', 'EN', 'F1', 'F2') and nbrjobs_ann_salary between 50000 and 99999.99 then 'EHRA2'
when nbrjobs_ecls_code in ('E1', 'E2', 'EN', 'F1', 'F2') and nbrjobs_ann_salary between 100000 and 149999.99 then 'EHRA3'
when nbrjobs_ecls_code in ('E1', 'E2', 'EN', 'F1', 'F2') and nbrjobs_ann_salary >= 150000 then 'EHRA4'
when nbrjobs_ecls_code in ('SE', 'SN', 'LE') and nbrjobs_ann_salary between 0 and 49999.99 then 'SHRA1'
when nbrjobs_ecls_code in ('SE', 'SN', 'LE') and nbrjobs_ann_salary between 50000 and 99999.99 then 'SHRA2'
when nbrjobs_ecls_code in ('SE', 'SN', 'LE') and nbrjobs_ann_salary between 100000 and 149999.99 then 'SHRA3'
when nbrjobs_ecls_code in ('SE', 'SN', 'LE') and nbrjobs_ann_salary >= 150000 then 'SHRA4'
when nbrjobs_ecls_code in ('FA') then 'AF'
when nbrjobs_ecls_code in ('SH', 'SS', 'TS', 'WS') then 'M1'
else
null
end as empl_cat
from nbrjobs a,
spriden,
(select goremal_pidm as pidm,
goremal_email_address as email
from goremal
where goremal_emal_code in ('EMPL', 'STDN')
and goremal_status_ind = 'A') gmal,
(SELECT sprtele_pidm AS pidm,
sprtele_phone_area AS area,
sprtele_phone_number AS phone
FROM sprtele c
WHERE sprtele_tele_code = 'CA'
AND sprtele_primary_ind = 'Y'
AND sprtele_status_ind IS NULL
AND sprtele_seqno =
(SELECT MAX (sprtele_seqno)
FROM sprtele
WHERE sprtele_tele_code = 'CA'
AND sprtele_primary_ind = 'Y'
AND sprtele_status_ind IS NULL
AND sprtele_pidm = c.sprtele_pidm)) phone_number,
--spraddr
(SELECT spraddr_pidm AS pidm,
spraddr_street_line1 AS permanent_address,
spraddr_city AS permanent_city,
spraddr_stat_code AS permanent_state,
spraddr_zip AS permanent_zip
FROM spraddr b
WHERE spraddr_atyp_code = 'CA'
AND spraddr_status_ind IS NULL
AND spraddr_seqno =
(SELECT MAX (spraddr_seqno)
FROM spraddr
WHERE spraddr_atyp_code = 'CA'
AND spraddr_status_ind IS NULL
AND spraddr_pidm = b.spraddr_pidm)) addr
where a.nbrjobs_pidm = spriden_pidm
and a.nbrjobs_pidm = gmal.pidm(+)
and a.nbrjobs_pidm = phone_number.pidm(+)
and a.nbrjobs_pidm = addr.pidm(+)
and spriden_change_ind is null
and a.nbrjobs_sgrp_code = to_char(sysdate, 'YYYY')
and a.nbrjobs_effective_date = (select max(b.nbrjobs_effective_date)
from nbrjobs b
where b.nbrjobs_pidm = a.nbrjobs_pidm
and b.nbrjobs_posn = a.nbrjobs_posn
and b.nbrjobs_effective_date <= sysdate
--and b.nbrjobs_ecls_code in ('E1','E2','EN','F1','F2','SE','SN','LE')
and b.nbrjobs_ecls_code in ('E1','E2','EN','F1','F2','SE','SN','LE', 'RF', 'AF', 'FA', 'SH', 'SS', 'TS', 'WS')
and b.nbrjobs_sgrp_code = to_char(sysdate, 'YYYY'))
and a.nbrjobs_status <> 'T';`
and this is the part of the query I am trying to change to return the desired email address
(select goremal_pidm as pidm,
goremal_email_address as email
from goremal
where goremal_emal_code in ('EMPL', 'STDN')
and goremal_status_ind = 'A') gmal,
So the issue is that the query will return two email addresses if the employee is also a student? What you can do in this case is PIVOT the data, then use COALESCE() to get the student email where the employee email is NULL. The below query would replace the problematic subquery:
SELECT pidm, COALESCE(empl_email, stdn_email) AS email
FROM (
SELECT goremal_pidm AS pidm, goremal_email_address AS email, goremal_emal_code
FROM goremal
WHERE goremal_emal_code in ('EMPL', 'STDN')
AND goremal_status_ind = 'A'
) PIVOT (
MAX(email) FOR goremal_emal_code IN ('EMPL' AS empl_email, 'STDN' AS stdn_email)
)
EDIT: As an aside, you can use conditional aggregation instead of an explicit PIVOT (helpful if you're using Oracle 9i or lower):
SELECT pidm, COALESCE(empl_email, stdn_email) AS email FROM (
SELECT goremal_pidm AS pidm
, MAX(CASE WHEN goremal_emal_code = 'EMPL' THEN goremal_email_address END) AS empl_email
, MAX(CASE WHEN goremal_emal_code = 'STDN' THEN goremal_email_address END) AS stdn_email
FROM goremal
WHERE goremal_emal_code in ('EMPL', 'STDN')
AND goremal_status_ind = 'A'
GROUP BY goremal_pidm
)
Hope this helps.
Try using NVL2, as a example for your case -
NVL2(EMP_EMAIL_ADR, EMP_EMAIL_ADR, STDN_EMAIL_ADR)
This clause will return if the Employee email address is not null else it returns Student email address.
Hope this helps.

SQL Server Group By with Joins

SELECT
[CustomerKey] AS 'Cust #',
CU.[CompanyName] AS 'Company Name',
ISS.InvoiceDate AS 'Invoice Date',
ISS.InvoiceTotal AS 'Invoice Total',
ISNULL(CU.ShopPhone,'') AS 'Company Shop',
ISNULL(CU.CellPhone,'') AS 'Company Cell',
ISNULL(CU.OfficePhone,'') AS 'Company Office',
ISNULL(CF.FirstName, '') AS 'FName',
ISNULL(CF.LastName,'') AS 'LName',
ISNULL(CF.WorkPhone,'') AS 'Contact Work',
ISNULL(CF.CellPhone,'') AS 'Contact Cell',
ISNULL(CF.HomePhone,'') AS 'Contact Home',
ISNULL(CF.EMail,'') AS 'Contact Email',
PSO.OutsidePartsSalespersonName
FROM
[ProfitMaster].[dbo].[vwAC_SSR_Customer] CU with (nolock)
LEFT JOIN
[ProfitMaster].[dbo].[vwAC_SSR_InvoiceSalesSummary] ISS with (nolock) ON CU.CustomerKey = ISS.Customer
JOIN
[ProfitMaster].[dbo].[vwSV_INV_PartsSalesOrder] PSO with (nolock) ON PSO.PartsSalesOrderInvoiceID = ISS.PartsSalesOrderInvoiceID
LEFT JOIN
(SELECT
EntityID, FirstName, LastName, WorkPhone, CellPhone, HomePhone, EMail
FROM
[ProfitMaster].[dbo].[vwGB_CON_ContactFull] with (nolock)
WHERE
EntityID IS NOT NULL
AND FirstName <> ''
AND EntityTypeID = '3'
AND SetDefault = '1'
GROUP BY
EntityID, FirstName, LastName, WorkPhone, CellPhone, HomePhone, EMail) AS CF ON CU.CustomerID = CF.EntityID
WHERE
CU.Inactive = '0'
AND ISS.InvoiceType = 'Parts Order'
AND ISS.InvoiceDate BETWEEN '2017-02-01 00:00:00.000' AND '2017-03-31 3:59:59.000'
AND CU.CustomerBaseBranchID = '1'
AND PSO.OutsidePartsSalespersonName IN ('Dave Freeland', 'Mark Miller', 'Ryan Oaks')
GROUP BY
CU.CustomerKey, CU.[CompanyName],
ISS.InvoiceDate, ISS.InvoiceTotal,
CU.ShopPhone, CU.CellPhone, CU.OfficePhone,
CF.FirstName, CF.LastName, CF.WorkPhone, CF.CellPhone,
CF.HomePhone, CF.EMail, PSO.OutsidePartsSalespersonName
ORDER BY
CU.CompanyName, ISS.InvoiceDate
How can I group this to SUM ISS.InvoiceTotal grouped by CustomerKey?
I keep getting the "is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause." error.
Any ideas?
If you SUM() in the ISS table you can't select from it or add it to the GROUP BY.
SELECT
[CustomerKey] AS 'Cust #',
CU.[CompanyName] AS 'Company Name',
LAST(ISS.InvoiceDate) AS 'Invoice Date',
SUM(ISS.InvoiceTotal) AS 'Invoice Total',
ISNULL(CU.ShopPhone,'') AS 'Company Shop',
ISNULL(CU.CellPhone,'') AS 'Company Cell',
ISNULL(CU.OfficePhone,'') AS 'Company Office',
ISNULL(CF.FirstName, '') AS 'FName',
ISNULL(CF.LastName,'') AS 'LName',
ISNULL(CF.WorkPhone,'') AS 'Contact Work',
ISNULL(CF.CellPhone,'') AS 'Contact Cell',
ISNULL(CF.HomePhone,'') AS 'Contact Home',
ISNULL(CF.EMail,'') AS 'Contact Email',
PSO.OutsidePartsSalespersonName
FROM
[ProfitMaster].[dbo].[vwAC_SSR_Customer] CU with (nolock)
LEFT JOIN
[ProfitMaster].[dbo].[vwAC_SSR_InvoiceSalesSummary] ISS with (nolock) ON CU.CustomerKey = ISS.Customer
JOIN
[ProfitMaster].[dbo].[vwSV_INV_PartsSalesOrder] PSO with (nolock) ON PSO.PartsSalesOrderInvoiceID = ISS.PartsSalesOrderInvoiceID
LEFT JOIN
(SELECT
EntityID, FirstName, LastName, WorkPhone, CellPhone, HomePhone, EMail
FROM
[ProfitMaster].[dbo].[vwGB_CON_ContactFull] with (nolock)
WHERE
EntityID IS NOT NULL
AND FirstName <> ''
AND EntityTypeID = '3'
AND SetDefault = '1'
GROUP BY
EntityID, FirstName, LastName, WorkPhone, CellPhone, HomePhone, EMail) AS CF ON CU.CustomerID = CF.EntityID
WHERE
CU.Inactive = '0'
AND ISS.InvoiceType = 'Parts Order'
AND ISS.InvoiceDate BETWEEN '2017-02-01 00:00:00.000' AND '2017-03-31 3:59:59.000'
AND CU.CustomerBaseBranchID = '1'
AND PSO.OutsidePartsSalespersonName IN ('Dave Freeland', 'Mark Miller', 'Ryan Oaks')
GROUP BY
CU.CustomerKey, CU.[CompanyName],
CU.ShopPhone, CU.CellPhone, CU.OfficePhone,
CF.FirstName, CF.LastName, CF.WorkPhone, CF.CellPhone,
CF.HomePhone, CF.EMail, PSO.OutsidePartsSalespersonName
ORDER BY
CU.CompanyName