How to match data with an upload file in SQL Server - sql

I have uploaded a file containing a list of cities called [Sandbox].[dbo].[Cities for Special Project]. I am trying to find out how revenue we are collecting from the cities in my list.
I can find out how much revenue we collect from each city thats already in my DB but I am unsure how on how to match that to the file that I uploaded.
Select individual.[vchCity] as City
, sum(staging.[Price_PerLicense]) as Total
From Engine1.DB1.[dbo].[Individual] individual
Join
Engine2.DB2.[dbo].[Daily_License_Report_Detail] staging on individual.[iIndividualId] = staging.[OwnerId]
Group by individual.[vchCity]
How would I match what I have to the cities thats in my uploaded file.

with CityTotal as(
Select individual.[vchCity] as City, sum(staging.[Price_PerLicense]) as Total
From Engine1.DB1.[dbo].[Individual] individual
Join Engine2.DB2.[dbo].[Daily_License_Report_Detail] staging on individual.[iIndividualId] = staging.[OwnerId]
Group by individual.[vchCity]
)
Select cs.city, ct.total
from [Sandbox].[dbo].[Cities for Special Project] cs
left join cititytotal ct on ct.city = cs.city
This is one way

Related

Counting between two different tables Oracle

I have two ORACLE tables, FOLDER and FILES. Each folder contains several files.
I am trying to get the number of files for number of folders. The number of folders x that contains the number of files y.
For example 50 folders contain 10 files, 35 folders contain 8 files...
Can I get some help please on the query :
select count(fl.id_folder) ,count(fi.fileID) from FOLDER fl inner join FILES fi on fl.id_folder=fi.fileID group by fl.id_folder;
You can use two levels of aggregation. Assuming that table files has a column called id_folder, you would do:
select cnt_files, count(*) cnt_folders
from (
select count(*) cnt_files
from files
group by id_folder
) t
group by cnt_files
We can write the query using group by as follows:
Select cnt_files, count(1) as num_of_folders
from
(select fl.id_folder, count(fi.fileid) as cnt_files
from FOLDER fl
Left join FILES fi on fl.id_folder=fi.fileID
Group by fl.id_folder)
Group by cnt_files;
Note: I have used the LEFT JOIN to consider all the folders (With and Without files in it)

Get content data from specific files from bigquery-public-data:github_repos different results with JOIN and WHERE

The most common way of getting content data from specific files bigquery-public-data:github_repos by name is like this:
SELECT *
FROM [bigquery-public-data:github_repos.sample_contents]
WHERE id IN (SELECT id FROM (
SELECT *
FROM [bigquery-public-data:github_repos.sample_files]
WHERE path = 'README.md'
))
This query gives me 14557 results.
I thought that running below query will give me the same ammount of results:
SELECT contents.*
FROM [bigquery-public-data:github_repos.sample_contents] contents
INNER JOIN [bigquery-public-data:github_repos.sample_files] files
ON contents.id = files.id
WHERE files.path = 'README.md'
But it ends up with 14645 results.
Why there is the difference between this two results, and witch one is a proper one for selecting content data of README.md file?
EDIT:
It looks like forked files without modification have the same id across others repos (forks).
First query gives you all contents with files having path = 'README.md' no matter how many times that file id is present in files table
Second query gives you same content as many times as respective file is in files table - because of JOIN
You can run below to validate this
SELECT EXACT_COUNT_DISTINCT(contents.id)
FROM [bigquery-public-data:github_repos.sample_contents] contents
INNER JOIN [bigquery-public-data:github_repos.sample_files] files
ON contents.id = files.id
WHERE files.path = 'README.md'

finding people with possible incorrectly spelled cities where zip codes match

I am trying to create a report that will return a list of people whose cities most likely need to be corrected.
I was thinking of comparing the data against other data within the table to leverage the assumption that most of the cities are spelled correctly. Take Albuquerque, for example. We have records for many of the zip codes, but the city isn't always spelled correctly.
I can't figure out my next step.
Here's what I have started with:
SELECT city, zip_5_digits, COUNT(*) AS "COUNT"
FROM people
INNER JOIN addresses
ON addresses.people_id = people.id
AND city LIKE 'Albu%que'
GROUP BY city, zip_5_digits
Doing this results in
Albuqureque 87108 1
Albuquerque 87108 238
Albuqerque 87109 1
Albuquerque 87109 34
What I'd like to do is, for each row, find the maximum records where the zip code matches but the city does not match. If there is no match, I want to return that record, and I'll use this to return people's id and names, since I most likely need to correct the name of the city for those people who have it mis-spelled.
This is hard, because some "cities" have very few residents. And, some zip codes might just have a small part of a city.
I would recommend two rules:
Look at zip codes that have at least a certain number of people -- say 100.
Look at cities in the zip code that have less than some number -- say 5.
There are candidates for misspellings:
SELECT pa.*
FROM (SELECT city, zip_5_digits, COUNT(*) AS cnt,
MAX(COUNT(*)) OVER (PARTITION BY zip_5_digits) as max_cnt,
SUM(COUNT(*)) OVER (PARTITION BY zip_5_digits) as sum_cnt
FROM people p, INNER JOIN
addresses a
ON a.people_id = p.id
GROUP BY city, zip_5_digits
) pa
WHERE sum_cnt >= 100 AND cnt <= 5;

Combining tables in SQL (only for for certain rows)

I wasn't even sure how to word this in the subject line. However, I have a SELECT statement that pulls information from an SQL view called CONTACTS (F_ame, L_Name, Address, Email, etc...). I have another view called INVOICES with purchase information for the members in the contacts view. Keep in mind that the INVOICES view has mulitple purchases of varying products for each member. These two views can be linked with a ContactID key
I only need to display a certain product (EADP Package), if purchased, from the INVOICES view on the same line as the member who purchased it. I also need to retain the entire member list in the pull. So, if I use a WHERE clause to only pull that product, It only gives me those who purchased that product. I need to keep the entire member list and still have a column that displays that particular product for those who purchased it. Hope that made sense.
Sorry, but there are 3 views, not 2. Here is what I have so far:
SELECT Contact.FirstName, Contact.LastName, Contact.CFSDesignation, Contact.EMailAddress1, Contact.Telephone1, Contact.DefaultPriceLevelIdName
FROM Contact INNER JOIN
FilteredInvoice ON Contact.ContactId = FilteredInvoice.contactid INNER JOIN
FilteredInvoiceDetail ON FilteredInvoice.invoiceid = FilteredInvoiceDetail.invoiceid
WHERE (Contact.DefaultPriceLevelIdName = 'member') AND FilteredInvoiceDetail.productidname = 'eAudiology 2014 Unlimited On-Demand Package'
This works fine,but only pulls those who purchased the package. I need the entire member list from CONTACTS (about 10,000 records) plus the product column showing the product above for those who purchased it. I belive is has something to do with joins, but can't get my head around it.
Getting Close, but it doesn't like the keyword "ON":
Also. in your previous answer, what are "C" and "I" used for?
SELECT Contact.FirstName, Contact.LastName, Contact.CFSDesignation, Contact.EMailAddress1, Contact.Telephone1, Contact.DefaultPriceLevelIdName
FROM Contact LEFT JOIN
FilteredInvoice ON Contact.ContactId = FilteredInvoice.contactid LEFT JOIN (SELECT FilteredInvoiceDetail.productidname
FROM FilteredInvoiceDetail
WHERE productidname = 'eAudiology 2014 Unlimited On-Demand Package') ON FilteredInvoice.invoiceid = FilteredInvoiceDetail.invoiceid
WHERE (Contact.DefaultPriceLevelIdName = 'member')
SELECT *
FROM Contacts C
LEFT JOIN ( SELECT *
FROM Invoices
WHERE Product = 'EADP Package') I
ON C.ContactID = I.ContactID
This finally worked for me:
SELECT DISTINCT Contact.ContactId,Contact.FirstName, Contact.LastName, Contact.CFSDesignation, Contact.EMailAddress1, Contact.DefaultPriceLevelIdName, productidname
FROM Contact LEFT JOIN
FilteredInvoice ON Contact.ContactId = FilteredInvoice.contactid LEFT JOIN (SELECT FilteredInvoiceDetail.productidname, FilteredInvoiceDetail.invoiceid
FROM FilteredInvoiceDetail
WHERE productidname = 'eAudiology 2014 Unlimited On-Demand Package' OR productidname = 'eAudiology 2014 Unlimited On-Demand Pkg Renewal') I ON FilteredInvoice.invoiceid = I.invoiceid
WHERE (Contact.DefaultPriceLevelIdName = 'member' OR DefaultPriceLevelIdName = 'student')
ORDER BY LastName

MSSQLSRV - filtering out results with duplicate row

I'm having a frustrating issue with SQL Server. I need to create a view from a table containing details of files loaded through ETL. The table contains a file id (unique), filename, serverid (relating to the server it has been loaded onto).
The first 2 letters of the filename is a country code, i.e. US, UK, GB, DE - there are multiple files loaded per country. I want to get the record with the highest file id for each country. The below query does this but it returns the highest record PER SERVER, so there may be multiple file ids - i.e. it would return the highest file id for that country on server1 and server2 - I only want the highest record full stop.
I've played with an equivalent query on MySQL and got it working by commenting out the last line (GROUP BY t.[server_id]), which seemed to work fine, but of course MSSQLSRV needs all non-aggregates in the SELECT to be placed in the GROUP BY statement.
So, how can I get the same result in SQL Server - i.e. get one result, with the highest file_id, without getting a duplicate row for a different server_id?
Hope I'm making myself clear.
SELECT MAX(t.[file_id]) AS FID
,LEFT(t.[full_file_name], 2) AS COUNTRYCODE
,t.[server_id]
FROM [tracking_files] t
WHERE t.server_id IS NOT NULL
AND t.[server_id] = (
SELECT TOP 1 [server_id]
FROM [tracking_files] md
WHERE md.[file_id] = t.file_id
)
GROUP BY LEFT(t.[full_file_name], 2)
,t.[server_id]
EDIT:
Here is the sample data I've been playing with in MySQL, along with the result I got (which is the desired result).
In SQL Server, as I can't comment out that last GROUP BY clause, we're seeing e.g. two file_ids for GB (one for server 1 and one for server 2)
If you are using SQL Server 2005 or later you can use ROW_NUMBER():
SELECT t.File_ID,
t.full_file_name,
t.CountryCode,
t.Server_ID
FROM ( SELECT t.[File_ID],
t.full_file_name,
CountryCode = LEFT(t.full_file_name, 2),
t.Server_ID,
RowNumber = ROW_NUMBER() OVER(PARTITION BY LEFT(t.full_file_name, 2) ORDER BY [File_ID] DESC)
FROM [tracking_files] t
) t
WHERE t.RowNumber = 1;
If you are using a previous version you will need to use a subquery to get the maximum file ID per country code, then join back to your main table:
SELECT t.[File_ID],
t.full_file_name,
CountryCode = LEFT(t.full_file_name, 2),
t.Server_ID
FROM [tracking_files] t
INNER JOIN
( SELECT MaxFileID = MAX([File_ID])
FROM [tracking_files] t
GROUP BY LEFT(t.full_file_name, 2)
) MaxT
ON MaxT.MaxFileID = t.[File_ID];