How do I remove duplicates from the MAXMIND database?

How do I remove duplicates from the MAXMIND database? - sql

I am currently working on the Maxmind database with the GeoIPCity-Location table and realised that hundreds of thousands of the data are duplicates. For example, I've searched for Newcastle Upon Tyne with:
SELECT * FROM MAXMIND
WHERE city='Newcastle Upon Tyne';
Which results in:
369137 GB I7 Newcastle Upon Tyne NE20 54.9881 -1.6194
369332 GB I7 Newcastle Upon Tyne NE6 54.9881 -1.6194
369345 GB I7 Newcastle Upon Tyne NE13 54.9881 -1.6194
369355 GB I7 Newcastle Upon Tyne NE3 54.9881 -1.6194
369356 GB I7 Newcastle Upon Tyne NE5 54.9881 -1.6194
369645 GB I7 Newcastle Upon Tyne NE4 54.9881 -1.6194
369706 GB I7 Newcastle Upon Tyne NE15 54.9881 -1.6194
369959 GB I7 Newcastle Upon Tyne NE12 54.9881 -1.6194
370114 GB I7 Newcastle Upon Tyne NE27 54.9881 -1.6194
This is worse if I just search for "Newcastle":
382 ZA 2 Newcastle -27.758 29.9318
2279 US OK Newcastle 73065 35.2323 -97.6008
26459 US CA Newcastle 95658 38.873 -121.1543
22382 CA ON Newcastle l1b1j9 43.9167 -78.5833
38995 AU 2 Newcastle -32.9278 151.7845
40025 US ME Newcastle 4553 44.0438 -69.5675
47937 GB I7 Newcastle 54.9881 -1.6194
119830 US ME Newcastle 4553 44.0438 -69.5675
119982 US NE Newcastle 68757 42.6475 -96.9232
115052 US CA Newcastle 95658 38.873 -121.1543
120603 US NE Newcastle 68757 42.6475 -96.9232
127931 US OK Newcastle 73065 35.2323 -97.6008
136726 CA ON Newcastle 43.9167 -78.5833
136915 US TX Newcastle 76372 33.245 -98.9103
137128 US WY Newcastle 82701 43.8396 -104.5681
137130 US WY Newcastle 82701 43.8396 -104.5681
144799 IE 16 Newcastle 52.4492 -9.0611
207626 US UT Newcastle 84756 37.6924 -113.6272
213128 US UT Newcastle 84756 37.6924 -113.6272
221968 KN 5 Newcastle 17.2 -62.5833
229237 CA ON Newcastle l1b1l4 43.9167 -78.5833
232235 GB M9 Newcastle 54.9881 -1.6194
229005 CA ON Newcastle l1b1g1 43.9167 -78.5833
228722 CA ON Newcastle l1b1k9 43.9167 -78.5833
242102 CA NB Newcastle 46.9833 -65.5667
252530 IE 7 Newcastle 53.3011 -6.5022
267662 CA ON Newcastle l1b1h5 43.9167 -78.5833
266263 IE 31 Newcastle 53.0683 -6.0658
313271 CA ON Newcastle l1b1h8 43.9167 -78.5833
306632 IE 10 Newcastle 53.3408 -8.6783
320151 CA ON Newcastle l1b1l2 43.9167 -78.5833
336868 GB L6 Newcastle 52.4333 -3.1167
353050 GB R9 Newcastle 54.2 -5.8833
349314 GB Q7 Newcastle 54.3833 -5.4667
377107 ZA 2 Newcastle 2940 -27.758 29.9318
369587 GB I7 Newcastle ST5 54.9881 -1.6194
387865 AU 2 Newcastle 2297 -32.9278 151.7845
387889 AU 2 Newcastle 2300 -32.9278 151.7845
387978 AU 2 Newcastle 2306 -32.9278 151.7845
388098 AU 2 Newcastle 2302 -32.9278 151.7845
392595 AU 2 Newcastle 2296 -32.9278 151.7845
388376 AU 2 Newcastle 2899 -32.9278 151.7845
Is there any way to remove the duplicates? How would I know which duplicate to remove and which ones to keep?
Thanks

Related

cross join returning date duplicates

I applied a cross join to get a desired number of rows (4 rows per computer and chip). This works fine but when I introduce a date column, my cross join query blows up.
Before introducing the date column:
select t1.computer,t1.chip,transactions from generate_series (1,4) as transactions cross join
(
select distinct computer,chip from the_table
)t1
order by 1,3
computer chip transactions
dell intel 1
dell intel 2
dell intel 3
dell intel 4
lenovo samsung 1
lenovo samsung 2
lenovo samsung 3
lenovo samsung 4
Good up to this part!
When I add a date column, the query blows up, more or less results in duplicates:
select t1.computer,t1.chip,t1.date_purchased,transactions from generate_series (1,4) as transactions cross join
(
select distinct computer,chip,date_purchased from the_table
)t1
order by 1,3,4
computer chip date_purchased transactions
dell intel 5/11/21 1
dell intel 5/11/21 2
dell intel 5/11/21 3
dell intel 5/11/21 4
dell intel 5/12/21 1
dell intel 5/12/21 2
dell intel 5/12/21 3
dell intel 5/12/21 4
dell intel 5/13/21 1
dell intel 5/13/21 2
dell intel 5/13/21 3
dell intel 5/13/21 4
lenovo samsung 5/17/21 1
lenovo samsung 5/17/21 2
lenovo samsung 5/17/21 3
lenovo samsung 5/17/21 4
lenovo samsung 5/18/21 1
lenovo samsung 5/18/21 2
lenovo samsung 5/18/21 3
lenovo samsung 5/18/21 4
What I am attempting to get:
computer chip date_purchased transactions
dell intel 5/11/21 1
dell intel 5/12/21 2
dell intel 5/13/21 3
dell intel null 4
lenovo samsung 5/17/21 1
lenovo samsung 5/18/21 2
lenovo samsung null 3
lenovo samsung null 4
If the data is unavailable for a date for the listed transactions then return null for the date_purchased.
Is there anyway to get my intended result!

I speculate that you want four rows per computer/chip combinations with different dates, if they are available. If so:
select computer, chip, tt.date_purchased, transactions
from generate_series (1, 4) gs(transactions) cross join
(select distinct computer, chip
from the_table
) cc left join
(select tt.*,
row_number() over (partition by computer, chip order by date_purchased desc) as transactions
from the_table tt
) tt
using (computer, chip, transactions)
order by 1, 3, 4

How to build reports in Oracle SQL developer

I need to generate 10 business reports based on the following tables and I'm having some troubles. I know I do not have enough for 10 reports yet, but I am having troubles starting.
CARE_CENTER
Care Center ID|Care Center Name |Care Center Address |Nurse In Charge
------------------+--------------------------+-----------------------------------+--------------------
11111 |Centers Health Care |4770 White Plains Rd, Bronx, NY |Nurse Johnson
22222 |Bronx Urgent Care |1733 E 174th Street, Bronx, NY |Nurse Robinson
33333 |BronxCare Special Care Center|1265 Fulton Avenue, Bronx, NY |Nurse Jones
44444 |Gold Crest Care Center |2316 Bruner Avenue, Bronx, NY |Nurse Gonzalez
55555 |Regeis Care Center |3200 Baychester Avenue, Bronx, NY |Nurse Waterman
66666 |MedCarePlus |1643 Westchester Avenue, Bronx, NY |Nurse Connor
77777 |ArchCare Senior Center |900 Intervale, Avenue, Bronx, NY |Nurse Rodriguez
88888 |Bronx Center for Rehab |1010 Underhill Avenu, Bronx, NY |Nurse Morales
VISIT_CARE_CENTER
Patient ID|Visit Number|Care Center ID
-------------+-----------------+-----------------
1122 |78945 |11111
2233 |89123 |22222
3344 |91456 |33333
4455 |64981 |44444
5566 |12065 |55555
6677 |98106 |66666
7788 |40169 |77777
8899 |26013 |88888
Volunteer_Assigned_Care_Center
Volunteer ID|Care Center ID
----------------+------------------
12333 |11111
23444 |22222
34555 |33333
45666 |44444
56777 |55555
67888 |66666
78999 |77777
89000 |88888
VOLUNTEER
Volunteer ID|Interest ID
---------------+-------------
12333 |00001
23444 |00002
34555 |00003
45666 |00004
56777 |00005
67888 |00006
78999 |00007
89000 |00008
INTEREST
Interest ID|Interest Description
--------------+--------------------
00001 |Organzing
00002 |Coordinating
00003 |Daily Activites
00004 |Assisting with fundraising
00005 |Planning Special Events
00006 |Feeding Patients
00007 |Cleaning Social Rooms
00008 |Caring for Visitors
I need to generate a report that shows the Care centers and the volunteer relationships.
How would I write the SQL query to generate this report based on the above table structure?

You basically need a join first
Select c.CareCenterID,
c.CareCenterName,
c.CareCenterAddress,
c.NurseInCharge, v.VolunteerID,
i.InterestID, i.InterestDescription
From
CARE_CENTER c
Join
Volunteer_Assigned_Care_Center va
On
c.CareCenterID =va.CareCenterID
Join
VOLUNTEER v
On va.VOLUNTEERId=
v.VOLUNTEERId
Join
Interest i
v.InterestId=i.InterestId

extracting data using beautifulsoup from wiki

I pretty new to this,
What I am trying to accomplished is having a table with distrcits and their various neighborhoods but my final code just list all neighborhoods in a list format without assigning them to a specific district.
url = "https://en.wikipedia.org/wiki/List_of_neighbourhoods_in_Toronto"
html = urlopen(url)
soup = BeautifulSoup(html, 'lxml')
type(soup)
print(soup.prettify())
Toronto_table = soup.find('table',{'class':'wikitable sortable'})
links = Toronto_table.find_all('a')
neighborhoods = []
for link in links:
neighborhoods.append(link.get('title'))
print(neighborhoods)
df_neighborhoods = pd.DataFrame(neighborhoods)
df_neighborhoods

You can simply read_html and print the table.
import pandas as pd
f_states=pd.read_html('https://en.wikipedia.org/wiki/List_of_neighbourhoods_in_Toronto')
print(f_states[6])
Output :
District Number Neighbourhoods Included
0 C01 Downtown, Harbourfront, Little Italy, Little P...
1 C02 The Annex, Yorkville, South Hill, Summerhill, ...
2 C03 Forest Hill South, Oakwood–Vaughan, Humewood–C...
3 C04 Bedford Park, Lawrence Manor, North Toronto, F...
4 C06 North York, Clanton Park, Bathurst Manor
5 C07 Willowdale, Newtonbrook West, Westminster–Bran...
6 C08 Cabbagetown, St. Lawrence Market, Toronto wate...
7 C09 Moore Park, Rosedale
8 C10 Davisville Village, Midtown Toronto, Lawrence ...
9 C11 Leaside, Thorncliffe Park, Flemingdon Park
10 C13 Don Mills, Parkwoods–Donalda, Victoria Village
11 C14 Newtonbrook East, Willowdale East
12 C15 Hillcrest Village, Bayview Woods – Steeles, Ba...
13 E01 Riverdale, Danforth (Greektown), Leslieville
14 E02 The Beaches, Woodbine Corridor
15 E03 Danforth (Greektown), East York, Playter Estat...
16 E04 The Golden Mile, Dorset Park, Wexford, Maryval...
17 E05 Steeles, L'Amoreaux, Tam O'Shanter – Sullivan
18 E06 Birch Cliff, Oakridge, Hunt Club, Cliffside
19 E08 Scarborough Village, Cliffcrest, Guildwood, Eg...
20 E09 Scarborough City Centre, Woburn, Morningside, ...
21 E10 Rouge (South), Port Union (Centennial Scarboro...
22 E11 Rouge (West), Malvern
23 W01 High Park, South Parkdale, Swansea, Roncesvall...
24 W02 Bloor West Village, Baby Point, The Junction (...
25 W03 Keelesdale, Eglinton West, Rockcliffe–Smythe, ...
26 W04 York, Glen Park, Amesbury (Brookhaven), Pelmo ...
27 W05 Downsview, Humber Summit, Humbermede (Emery), ...
28 W06 New Toronto, Long Branch, Mimico, Alderwood
29 W07 Sunnylea (The Queensway – Humber Bay)
30 W08 The Kingsway, Central Etobicoke, Eringate – Ce...
31 W09 Kingsview Village-The Westway, Richview (Willo...
32 W10 Rexdale, Clairville, Thistletown - Beaumond He...

SQL Results Missing When Adding Columns

Trying to figure out an issue that is causing my SQL Server query to return no results. I have a query which calls out where there is more than one unique 'requested delivery date' at a size level on a single PO. I do this using COUNTand DISTINCT. It works perfect until I add the fields 'PO_ITEM_NUMBER' and 'REQ_DELIV_DATE'- this was requested by the business. I am not sure why that would cause an issue - for reference our tier for PO tables is Header,Item,Size - size being the most granular and it is SAP based.
Query:
SELECT E.TEAM_MEMBER_NAME [EMPLOYEE],
H.PO_TYPE,
H.PO_ISSUE_DATE,
S.PO_NUMBER,
S.MATERIAL,
M.DESCRIPTION,
H.PO_ORDERED_QUANTITY [PO_QUANTITY], -- if you sue SUM(S.PO_ORDERED_QUANTITY - you get more results but wrong totals
K.BUSINESS_SEGMENT_DESC,
S.PO_REQ_DELIV_DATE,
S.PO_ITEM_NUMBER
FROM PDX_SAP_USER..VW_PO_SIZE S --- you can use .. insetead of .dbo.
JOIN ADI_USER_MAINTAINED..SCM_PO_Employee_Name E --- join the po to employee assigment table
ON S.PO_NUMBER = E.PO_NUMBER
JOIN PDX_SAP_USER..VW_PO_HEADER H
ON E.PO_NUMBER = H.PO_NUMBER
JOIN PDX_SAP_USER..VW_PO_ITEM I
ON E.PO_NUMBER = I.PO_NUMBER
JOIN PDX_SAP_USER..VW_MM_MATERIAL M
ON E.MATERIAL = M.MATERIAL
JOIN PDX_SAP_USER..vw_kd_BUSINESS_SEGMENT K
ON M.BUSINESS_SEGMENT_CODE = K.BUSINESS_SEGMENT_CODE
WHERE I.PO_BALANCE_QUANTITY > 0 ---exclude any fully received PO's
AND NOT EXISTS (SELECT * FROM VW_PO_ITEM I1 WHERE DEL_INDICATOR = 'L' AND I.PO_NUMBER = I1.PO_NUMBER)
GROUP BY S.PO_NUMBER,
E.TEAM_MEMBER_NAME,
H.PO_TYPE,
H.PO_ISSUE_DATE,
S.MATERIAL,
M.DESCRIPTION,
K.BUSINESS_SEGMENT_DESC,
H.PO_ORDERED_QUANTITY,
S.PO_REQ_DELIV_DATE,
S.PO_ITEM_NUMBER
HAVING COUNT(DISTINCT S.PO_REQ_DELIV_DATE) > 1
ORDER BY S.PO_NUMBER
Adding query that works along with the results:
SELECT E.TEAM_MEMBER_NAME [EMPLOYEE],
H.PO_TYPE,
CONVERT(VARCHAR(12),H.PO_ISSUE_DATE,101) [PO_ISSUE_DATE],
S.PO_NUMBER,
S.MATERIAL,
M.DESCRIPTION,
H.PO_ORDERED_QUANTITY [PO_QUANTITY], --- if you use SUM(S.PO_ORDERED_QUANTITY) - you get more results but wrong totals
K.BUSINESS_SEGMENT_DESC
FROM PDX_SAP_USER..VW_PO_SIZE S --- you can use .. insetead of .dbo.
JOIN ADI_USER_MAINTAINED..SCM_PO_Employee_Name E --- join the po to employee assigment table
ON S.PO_NUMBER = E.PO_NUMBER
JOIN PDX_SAP_USER..VW_PO_HEADER H
ON E.PO_NUMBER = H.PO_NUMBER
JOIN PDX_SAP_USER..VW_PO_ITEM I
ON E.PO_NUMBER = I.PO_NUMBER
JOIN PDX_SAP_USER..VW_MM_MATERIAL M
ON E.MATERIAL = M.MATERIAL
JOIN PDX_SAP_USER..vw_kd_BUSINESS_SEGMENT K
ON M.BUSINESS_SEGMENT_CODE = K.BUSINESS_SEGMENT_CODE
WHERE I.PO_BALANCE_QUANTITY > 0 ---exclude any fully received PO's
AND NOT EXISTS (SELECT * FROM VW_PO_ITEM I1 WHERE DEL_INDICATOR = 'L' AND I.PO_NUMBER = I1.PO_NUMBER)
GROUP BY S.PO_NUMBER,
E.TEAM_MEMBER_NAME,
H.PO_TYPE,
H.PO_ISSUE_DATE,
S.MATERIAL,
M.DESCRIPTION,
K.BUSINESS_SEGMENT_DESC,
H.PO_ORDERED_QUANTITY
HAVING COUNT(DISTINCT S.PO_REQ_DELIV_DATE) > 1
ORDER BY S.PO_NUMBER
Results:
EMPLOYEE PO_TYPE PO_ISSUE_DATE PO_NUMBER MATERIAL DESCRIPTION PO_QUANTITY BUSINESS_SEGMENT_DESC
------------------------------ ------- ------------- ---------- ------------------ ---------------------------------------- --------------------------------------- ----------------------------------------------------------------------------------------------------
Christopher Olson NB 01/19/2017 0282238419 CD7078 ESS 3S PANT WVN 2054 CORE APP MEN SPORT ADIDAS
Juan Gomez NB 02/23/2017 0282524995 S98775 ESS LIN P/O FT 103 CORE APP MEN SPORT ADIDAS
Christopher Olson NB 03/09/2017 0282598957 BK7410 ESS LGO T P SJ 619 ATHLETICS APP MEN ADIDAS
Juan Gomez NB 03/28/2017 0282706115 S97155 ESS LIN TIGHT 961 CORE APP WOMEN SPORT ADIDAS
Juan Gomez NB 09/21/2017 0283752965 CF8152 BOS LABEL 7900 ATHLETICS APP MEN ADIDAS
Julie Lange-May 12 10/02/2017 0283796594 DQ1421 WOVEN JACKET W 1020 ATHLETICS APP WOMEN ADIDAS
Kekai Ariola NB 10/10/2017 0283837426 AC7366 PW HU HOLI Tennis Hu MC 5655 STATEMENT FTW ADIDAS
Cody Lofquist NB 11/10/2017 0283944933 DB2061 PREDATOR TANGO 18.1 TR 1756 FOOTBALL FTW ADIDAS
Andrew Zapata 05 11/13/2017 0283961402 CG6440 NEMEZIZ 18.1 FG W 543 FOOTBALL FTW ADIDAS
Christopher Olson NB 11/20/2017 0283981666 CV7748 ASSITA 17 GK Y 1648 FOOTBALL APP GENERIC ADIDAS
Cody Lofquist NB 11/21/2017 0283984539 DB2165 COPA 18.1 FG 501 FOOTBALL FTW ADIDAS
Julie Lange-May NB 11/26/2017 0284043157 CE4368 I GRPHC STSET 1333 ORIGINALS APP KIDS ADIDAS
Trey Pflug NB 11/27/2017 0284048754 CQ3168 SOLAR BOOST M 3500 RUNNING FTW MEN ADIDAS
Dave Laws NB 11/28/2017 0284059045 DB2966 YEEZY 500 15334 YEEZY FTW ADIDAS
Dave Laws NB 11/28/2017 0284059047 DB2966 YEEZY 500 12584 YEEZY FTW ADIDAS
Christopher Olson NB 12/06/2017 0284094060 BJ9165 TASTIGO17 SHO W 7522 FOOTBALL APP GENERIC ADIDAS
Christopher Olson NB 12/06/2017 0284094212 BK0350 TIRO17 TRG PNTW 7091 FOOTBALL APP GENERIC ADIDAS
Cody Lofquist NB 12/08/2017 0284107301 DB2062 PREDATOR TANGO 18.1 TR 2110 FOOTBALL FTW ADIDAS
Trey Pflug NB 12/11/2017 0284115640 BC0674 SOLAR BOOST W 1752 RUNNING FTW WOMEN ADIDAS
Kim Moreland NB 12/12/2017 0284137355 DJ3033 D2M K SHT 1730 CORE APP WOMEN SPORT ADIDAS
Cody Lofquist NB 12/12/2017 0284141196 DB2126 PREDATOR TANGO 18.3 IN 1988 FOOTBALL FTW ADIDAS
Cody Lofquist NB 12/12/2017 0284141253 AQ0612 NEMEZIZ MESSI TANGO 18.3 TF 526 FOOTBALL FTW ADIDAS
Dave Laws NB 12/15/2017 0284170426 DB2966 YEEZY 500 2918 YEEZY FTW ADIDAS
Cody Lofquist NB 12/16/2017 0284174671 DB2248 X 18.1 FG 668 FOOTBALL FTW ADIDAS
Cody Lofquist NB 12/16/2017 0284174673 DB2039 PREDATOR 18.1 FG 489 FOOTBALL FTW ADIDAS
Christopher Olson ER 12/20/2017 0284207872 BS4250 TASTIGO17 SHO 404 FOOTBALL APP GENERIC ADIDAS
Ben Paul NB 12/19/2017 0284208137 CG0584 REAL A JSY 811 FOOTBALL APP LICENSED ADIDAS
Julie Lange-May NB 01/07/2018 0284316616 DN4273 UAS BEANIE 120 ORIGINALS APP MEN ADIDAS
Cody Lofquist NB 01/08/2018 0284319552 DB2063 PREDATOR TANGO 18.1 TR 2001 FOOTBALL FTW ADIDAS
Cody Lofquist NB 01/19/2018 0284464341 DB2214 X 18+ FG 582 FOOTBALL FTW ADIDAS
Cody Lofquist NB 01/19/2018 0284464343 DB2013 PREDATOR 18+ FG 2201 FOOTBALL FTW ADIDAS
Cody Lofquist NB 01/19/2018 0284464344 DB2072 NEMEZIZ 18+ FG 1467 FOOTBALL FTW ADIDAS
Cody Lofquist NB 01/19/2018 0284464346 DB2251 X 18.1 FG 620 FOOTBALL FTW ADIDAS
Cody Lofquist NB 01/19/2018 0284464348 DB2167 COPA 18.1 FG 1714 FOOTBALL FTW ADIDAS
Cody Lofquist NB 01/19/2018 0284464349 DB2089 NEMEZIZ MESSI 18.1 FG 988 FOOTBALL FTW ADIDAS
Cody Lofquist NB 01/19/2018 0284464350 DB2040 PREDATOR 18.1 FG 2061 FOOTBALL FTW ADIDAS
Cody Lofquist NB 01/19/2018 0284465944 DB2001 PREDATOR 18.3 FG 7008 FOOTBALL FTW ADIDAS
Cody Lofquist NB 01/23/2018 0284489924 772109 SAMBA CLASSIC 419 FOOTBALL FTW ADIDAS
Andrew Zapata 05 02/02/2018 0284539184 DH3869 CLIMA 3.0 TEE 1853 ACTION SPORTS APP ADIDAS
Cody Lofquist NB 02/06/2018 0284550445 BB0571 Goletto VI FG J 5562 FOOTBALL FTW ADIDAS
Cody Lofquist NB 02/24/2018 0284666220 DM2092 MLS ASG OMB 424 FOOTBALL ACC HW ADIDAS
Christopher Olson NB 01/19/2018 0284666914 BP9111 D2M 3S SHORT 11811 CORE APP MEN SPORT ADIDAS
Cody Lofquist NB 02/27/2018 0284684097 019228 MUNDIAL TEAM 657 FOOTBALL FTW ADIDAS
Cody Lofquist NB 03/06/2018 0284704098 CD4683 GENERICWCBOX 80000 FOOTBALL ACC HW ADIDAS
Kekai Ariola NB 03/08/2018 0284728508 BB7619 Sobakov 1775 ORIGINALS FTW MEN ADIDAS
Kim Moreland NB 03/08/2018 0284730274 BP9733 ULT SS T 2557 TRAINING APP MEN ADIDAS
Kekai Ariola 05 03/27/2018 0284865999 B37532 EQT SUPPORT SK PK W 347 ORIGINALS FTW WOMEN ADIDAS
Kekai Ariola 05 03/27/2018 0284866000 B37545 EQT SUPPORT SK PK W 357 ORIGINALS FTW WOMEN ADIDAS
Kim Moreland NB 04/06/2018 0284914322 DH3591 Tech Tee 10042 TRAINING APP WOMEN ADIDAS
Cody Lofquist NB 04/10/2018 0284930265 CW5627 Pred FS JR MN 1651 FOOTBALL ACC HW ADIDAS
Kekai Ariola NB 04/10/2018 0284930449 B41794 PW TENNIS HU 315 ORIGINALS FTW MEN ADIDAS
(51 row(s) affected)

If you added HAVING clause recently then comment and check.
If not then change the INNER JOIN TO LEFT join from bottom to top one at a time and execute the query and check.
This way you can troubleshoot the issue easily.
Unless we have the sample data for all the tables you are used the query it will be difficult to answer.

WITH POsToReturn AS (
SELECT S.PO_NUMBER
FROM PDX_SAP_USER..VW_PO_SIZE S
JOIN PDX_SAP_USER..VW_PO_ITEM I
ON I.PO_NUMBER = S.PO_NUMBER
WHERE I.PO_BALANCE_QUANTITY > 0
GROUP BY S.PO_NUMBER
HAVING COUNT(DISTINCT S.PO_REQ_DELIV_DATE) > 1
AND COUNT(CASE WHEN I.DEL_INDICATOR = 'L' THEN 1 END) = 0
)
SELECT <your columns>
FROM POsToReturn P
JOIN PDX_SAP_USER..VW_PO_SIZE S
ON S.PO_NUMBER = P.PO_NUMBER
... <join the rest of the tables for the detail columns>
-- leave out the entire group by!
Or you could go with an approach like this:
with data as (
select
count(distinct S.PO_REQ_DELIV_DATE)
over (partition by S.PO_NUMBER) as rdd_count
<insert rest of main query>
)
select ... from data
where rdd_count > 1
order by S.PO_NUMBER;
As for understanding why, I think you've seen other queries where people added columns to a group by clause so they could get around a mysterious error message about non-aggregates. If not that then you've seen those systems that confusingly permit a non-standard behavior that gives results without erroring at all.
Usually what people want to accomplish is something like this: I'm grouping on Customer ID already but I want Customer Name in the results too. So they add that extra column to the 'group by' list and everything works fine. But if you think about it the reason is that the new column didn't change the groups at all since each Customer ID always has the same Customer Name anyway and it's ultimately just an easy way to get rid of the error. In your query that's not true though: you do have more than one date in all the groups you care about.
In my opinion it's better to use dummy aggregates like min(Customer Name) as CustomerName. Remember an aggregate function's purpose is collapsing multiple values into a single value. When necessary some systems will just pick a value at random, without warning you it did that. Many MySQL and Sybase developers got burned by this when they relied on this quirky behavior and/or never really learned how it's supposed to work.
Also remember that in general a group is a set of multiple rows collapsed into just one row. For count(distinct) to work it needed to see multiple dates across a single group. But that conflicts with your need to keep the output as separate rows.
Essentially the row by itself doesn't give you enough information to decide whether to keep it. I solved that by using a second query to figure a list of PO_Numbers meeting the criteria and then using that as a filter (via inner join.)
In the second example I used a window function instead, which lets you look outside each row without the row-collapsing behavior of group by. Both of the basically let you do what the having was intended to accomplish.

Count and Avg in query to determine if result is greater than x

I have taken the chance to learn sql. I have a basic table named book. I have been presented with the scenario: display the book category, and the number of
books in each book category for books with an average price greater than 15
dollars. My query below is not working. How can i count each book type and get the average price to see if it greater than 15?
Query
SELECT type FROM books WHERE COUNT(DISTINCT type) * SUM(price) > 15;
Table Schema
Select * From Book;
BOOK TITLE PUB TYP PRICE P
---- ---------------------------------------- --- --- ---------- -
0180 A Deepness in the Sky TB SFI 7.19 Y
0189 Magic Terror FA HOR 7.99 Y
0200 The Stranger VB FIC 8 Y
0378 Venice SS ART 24.5 N
079X Second Wind PU MYS 24.95 N
0808 The Edge JP MYS 6.99 Y
1351 Dreamcatcher: A Novel SC HOR 19.6 N
1382 Treasure Chests TA ART 24.46 N
138X Beloved PL FIC 12.95 Y
2226 Harry Potter and the Prisoner of Azkaban ST SFI 13.96 N
2281 Van Gogh and Gauguin WP ART 21 N
2766 Of Mice and Men PE FIC 6.95 Y
2908 Electric Light FS POE 14 N
3350 Group: Six People in Search of a Life BP PSY 10.4 Y
3743 Nine Stories LB FIC 5.99 Y
3906 The Soul of a New Machine BY SCI 11.16 Y
5163 Travels with Charley PE TRA 7.95 Y
5790 Catch-22 SC FIC 12 Y
6128 Jazz PL FIC 12.95 Y
6328 Band of Brothers TO HIS 9.6 Y
669X A Guide to SQL CT CMP 37.95 Y
6908 Franny and Zooey LB FIC 5.99 Y
7405 East of Eden PE FIC 12.95 Y
7443 Harry Potter and the Goblet of Fire ST SFI 18.16 N
7559 The Fall VB FIC 8 Y
8092 Godel, Escher, Bach BA PHI 14 Y
8720 When Rabbit Howls JP PSY 6.29 Y
9611 Black House RH HOR 18.81 N
9627 Song of Solomon PL FIC 14 Y
9701 The Grapes of Wrath PE FIC 13 Y
9882 Slay Ride JP MYS 6.99 Y
9883 The Catcher in the Rye LB FIC 5.99 Y
9931 To Kill a Mockingbird HC FIC 18 N

When you want to have the aggregations in the filter then you should GROUP BY and then use the HAVING clause.
SELECT type FROM books
GROUP BY type
HAVING COUNT(*) * SUM(price) > 15;

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How do I remove duplicates from the MAXMIND database? - sql

Related

cross join returning date duplicates

How to build reports in Oracle SQL developer

extracting data using beautifulsoup from wiki

SQL Results Missing When Adding Columns

Count and Avg in query to determine if result is greater than x

Categories

Resources