how to shorten the runtime of a bigquery query - sql
I have two tables:
1. PEOPLE (PK, Name, Address, Zip, <<some random other columns>>)
2. EMAIL (PK, Name, Address, Zip, Email)
This is a one-to-many table where they are linked by Name, Address, and Zip.
What I need is:
PEOPLE (PK, Name, Address, Zip, <<some random other columns>>, FK_Email1, Email1, FK_Email2, Email2, FK_Email3, Email3)
What I have so far is this:
#standardSQL
SELECT a.PK, a.FK, Source, FirstName, LastName, MiddleName, SuffixName, Gender, Age, DOB, Address, Address2, City, State, Zip, Zip4, Cleaned_HouseNumber, Cleaned_Street, Cleaned_City, Cleaned_County, Cleaned_State, Cleaned_Zip, TimeZone, Income, HomeValue, Networth, MaritalStatus, IsRenter, HasChildren, CreditRating, Investor, LinesOfCredit, InvestorRealEstate, Traveler, Pets, MailResponder, Charitable, PolicalDonations, PoliticalParty, ATTOM_ID, GEOID, SCORE, Latitude, Longitude, SpouseFirstName, SpouseLastName, HomeAvailableHomeEquity, HomeTotalLoans, HomeLoan1Amount, HomeLoan2Amount, HomeValueRangeCode, HomeValueRangeText, HomeMarketValue, HomeAssessedValue, HomeLoanToValue, HomeSQFT, HomeLotSQFT, HomeYearBuilt, HomePurchaseDate, HomeLoan1Date, HomeLoan2Date, HomeParcelNumber, HomePropertyType, DNC, HomeCompanyOwned, HomeTrustOwned, HomeOwnerOccupied, HomeType, HomePool, HomeGarage, HomeHeating, HomeCooling, HomeBedrooms, HomeBathrooms, HomeNumberOfUnits, MailingAddress, MailingCity, MailingState, MailingZip, MailingZip4, Married, Divorce, Education, Occupation, Ethnicity, LANGUAGE, RELIGION,
FK_Email[SAFE_ORDINAL(1)] FK_Email1, Emails[SAFE_ORDINAL(1)] Email1, FK_Email[SAFE_ORDINAL(2)] as FK_Email2, Emails[SAFE_ORDINAL(2)] Email2, FK_Email[SAFE_ORDINAL(3)] as FK_Email3, Emails[SAFE_ORDINAL(3)] Email3
FROM (
SELECT
P.PK, P.FK, P.Source, P.FirstName, P.LastName, MiddleName, SuffixName, Gender, Age, DOB, P.Address, Address2, P.City, P.State, P.Zip, Zip4, Cleaned_HouseNumber, Cleaned_Street, Cleaned_City, Cleaned_County, Cleaned_State, Cleaned_Zip, TimeZone, Income, HomeValue, Networth, MaritalStatus, IsRenter, HasChildren, CreditRating, Investor, LinesOfCredit, InvestorRealEstate, Traveler, Pets, MailResponder, Charitable, PolicalDonations, PoliticalParty, ATTOM_ID, GEOID, SCORE, Latitude, Longitude, SpouseFirstName, SpouseLastName, HomeAvailableHomeEquity, HomeTotalLoans, HomeLoan1Amount, HomeLoan2Amount, HomeValueRangeCode, HomeValueRangeText, HomeMarketValue, HomeAssessedValue, HomeLoanToValue, HomeSQFT, HomeLotSQFT, HomeYearBuilt, HomePurchaseDate, HomeLoan1Date, HomeLoan2Date, HomeParcelNumber, HomePropertyType, DNC, HomeCompanyOwned, HomeTrustOwned, HomeOwnerOccupied, HomeType, HomePool, HomeGarage, HomeHeating, HomeCooling, HomeBedrooms, HomeBathrooms, HomeNumberOfUnits, MailingAddress, MailingCity, MailingState, MailingZip, MailingZip4, Married, Divorce, Education, Occupation, Ethnicity, LANGUAGE, RELIGION
, ARRAY_AGG(E.Email) Emails, ARRAY_AGG(E.PK) FK_Email
FROM `db.ds.table1` P
left JOIN `db.ds.table2` E
ON P.FirstName = E.FirstName
AND P.LastName = E.LastName
AND P.Address = E.Address
AND P.Zip = E.Zip
Group by 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87
) a
my problem is this already goes past the six out time limit.
Is there anyway to make this run faster?
Thanks!
I feel below does the same but in more optimized way
#standardSQL
SELECT
PK, FK, Source, P.FirstName, P.LastName, MiddleName, SuffixName, Gender, Age, DOB, P.Address, Address2, City, State, P.Zip, Zip4, Cleaned_HouseNumber, Cleaned_Street, Cleaned_City, Cleaned_County, Cleaned_State, Cleaned_Zip, TimeZone, Income, HomeValue, Networth, MaritalStatus, IsRenter, HasChildren, CreditRating, Investor, LinesOfCredit, InvestorRealEstate, Traveler, Pets, MailResponder, Charitable, PolicalDonations, PoliticalParty, ATTOM_ID, GEOID, SCORE, Latitude, Longitude, SpouseFirstName, SpouseLastName, HomeAvailableHomeEquity, HomeTotalLoans, HomeLoan1Amount, HomeLoan2Amount, HomeValueRangeCode, HomeValueRangeText, HomeMarketValue, HomeAssessedValue, HomeLoanToValue, HomeSQFT, HomeLotSQFT, HomeYearBuilt, HomePurchaseDate, HomeLoan1Date, HomeLoan2Date, HomeParcelNumber, HomePropertyType, DNC, HomeCompanyOwned, HomeTrustOwned, HomeOwnerOccupied, HomeType, HomePool, HomeGarage, HomeHeating, HomeCooling, HomeBedrooms, HomeBathrooms, HomeNumberOfUnits, MailingAddress, MailingCity, MailingState, MailingZip, MailingZip4, Married, Divorce, Education, Occupation, Ethnicity, LANGUAGE, RELIGION,
FK_Email[SAFE_ORDINAL(1)] FK_Email1, Emails[SAFE_ORDINAL(1)] Email1, FK_Email[SAFE_ORDINAL(2)] AS FK_Email2, Emails[SAFE_ORDINAL(2)] Email2, FK_Email[SAFE_ORDINAL(3)] AS FK_Email3, Emails[SAFE_ORDINAL(3)] Email3
FROM `db.ds.table1` P
LEFT JOIN (
SELECT FirstName, LastName, Address, Zip,
ARRAY_AGG(Email LIMIT 3) Emails, ARRAY_AGG(PK LIMIT 3) FK_Email
FROM `db.ds.table2`
GROUP BY FirstName, LastName, Address, Zip
) E
ON P.FirstName = E.FirstName
AND P.LastName = E.LastName
AND P.Address = E.Address
AND P.Zip = E.Zip
Related
Displaying the Characters Represented with an Integer in SQL
I'm trying to generate a report where it lists the race, ethnicity and county of a client. Right now, the report displays 1,2,3, etc. for the race that has been selected as opposed to Black, White, Other, etc. This is also the case for ethnicity and county. I'm wondering what is the proper way to display the letter description as opposed to the integer for the race, ethnicity, and county. I've included my query below, any assistance is appreciated. Thank you. SELECT person.idFamily AS Family_ID, person.id AS Person_ID, (SELECT person.firstName+ ', ' + person.lastName) AS Name, person.Race AS Race, person.Ethnicity AS Ethnicity, family.capidCounty AS County, person.birthDate AS BirthDate, DATEDIFF(year,person.birthDate,getdate()) as Age FROM Family LEFT JOIN person ON family.Id = person.idFamily
Why Is my code being ignored?
When I execute the only thing that is working is the Zipcodes. It is not giving me a service date greater than 08/01/16 or my Billing Providers. I am not sure why. SELECT Name, BirthDate, Address1, Address2, City, StateProvince, ZipCode, BillingProviderID, ServiceDate, ChargeCatID FROM dbo.PbrChargeTransactions WHERE ChargeCatID = 'EM' AND ServiceDate >= '08/01/16' AND BillingProviderID IN ('AAD.FD','DSD.DFD','ASDF.DD') AND ZipCode Like '68730%' OR ZipCode Like '68792%' OR ZipCode Like '68739%' OR ZipCode Like '68718%' OR ZipCode Like '57069%' OR ZipCode Like '57031%' OR ZipCode Like '57078%' OR ZipCode Like '57066%' OR ZipCode Like '57063%' OR ZipCode Like '57037%' OR ZipCode Like '57073%' OR ZipCode Like '57029%' OR ZipCode Like '57070%' ORDER BY Name
Operator precedence: "And" binds more tightly than "or". That is, only the first zip-code comparison is evaluated with the rest of your "and" conditions. The rest are 'OR's and if they match, you get the record. Try this: SELECT Name, BirthDate, Address1, Address2, City, StateProvince, ZipCode, BillingProviderID, ServiceDate, ChargeCatID FROM dbo.PbrChargeTransactions WHERE ChargeCatID = 'EM' AND ServiceDate >= '08/01/16' AND BillingProviderID IN ('AAD.FD','DSD.DFD','ASDF.DD') AND (ZipCode Like '68730%' OR ZipCode Like '68792%' OR ZipCode Like '68739%' OR ZipCode Like '68718%' OR ZipCode Like '57069%' OR ZipCode Like '57031%' OR ZipCode Like '57078%' OR ZipCode Like '57066%' OR ZipCode Like '57063%' OR ZipCode Like '57037%' OR ZipCode Like '57073%' OR ZipCode Like '57029%' OR ZipCode Like '57070%' ) ORDER BY Name
You can use (...) as n8wrl shows or the following, which I think will be faster. SELECT Name, BirthDate, Address1, Address2, City, StateProvince, ZipCode, BillingProviderID, ServiceDate, ChargeCatID FROM dbo.PbrChargeTransactions WHERE ChargeCatID = 'EM' AND ServiceDate >= '08/01/16' AND BillingProviderID IN ('AAD.FD','DSD.DFD','ASDF.DD') AND LEFT(ZipCode,5) in ('68730','68792','68739','68718','57069','57031', '57078','57066','57063','57037','57073','57029','57070') ORDER BY Name
The issue might be because you missed some parenthesis. Please be aware of the precedence of operators If you write A AND B AND zip1 OR zip2 OR zip3 this is equals to ((A AND B) AND zip1) OR zip2 OR zip3 I think you intended to write A AND B AND (zip1 OR zip2 OR zip3) where A, B, zip1,... are boolean conditions like zip LIKE '12345%'
How to write the following Queries in SQL
My tables: Musician (id, mname, aname, percentage(fee for the agemt)) Singer (id, gender) Instrumentalist (id, instrument) Agent (aname, street, city, zip) Festival (title, place, sdate, edate) Event (title, edate, etime) Booked (id, title (Event.title), edate, etime, salary) for each festival, find the agent , its overall profit this festival is the highest. Presented the festival's name, agent's name, and its overall profit. Festival.title="Autumn Festival" and salary<8000, find the date and time of the event, that taking part in only instrumentalists, and the number of instrumentalists at this event is the lowest. 1: select f1.title , aname , sum (salary*percentage/100) as feeAgent from festival as f1 , booked , musician where booked.id=musician.id and booked.edate between f1.sdate and f1.edate group by f1.title , aname having sum (salary*percentage/100)>any(select sum (salary*percentage/100) from festival as f1 , booked , musician where booked.id=musician.id and booked.edate between f1.sdate and f1.edate group by musician.aname) 2: select event.edate, event.etime from event, festival as f1, booked natural join musician natural join instrumentalist where event.title=f1.title and f1.title Like 'spring%' and booked.salary<8000 and booked.edate between f1.sdate and f1.edate group by event.edate, event.etime having count(booked.title)<all(select count(*) from event,festival as f1,booked natural join musician natural join instrumentalist where event.title=f1.title and f1.title Like 'spring%' and booked.salary<8000 and booked.edate between f1.sdate and f1.edate group by booked.title)
Select Only one row where Address Matches
Using SQL Server, I have a Contact table that contains the name and address of my contacts. I have several instances of multiple contacts living at the same address. When sending my newsletter, I only want to send it to each address one time. How can I modify the query below to only display the first contact at each address? SELECT dbo_Contact.Contact_Title AS Title, dbo_Contact.Contact_FirstName AS [First Name], dbo_Contact.Contact_LastName AS [Last Name], dbo_Contact.Contact_Suffix AS Suffix, dbo_Contact.Business_Name AS [Business Name], dbo_Contact.Contact_Address1 AS [Address 1], dbo_Contact.Contact_Address2 AS [Address 2], dbo_Contact.Contact_City AS City, dbo_Contact.Contact_State AS State, dbo_Contact.Contact_Zip AS Zip, dbo_Contact.Contact_Email AS Email FROM dbo_Contact INNER JOIN dbo_Mailing_Subscribers ON dbo_Contact.[ContactID] = dbo_Mailing_Subscribers.[ContactID] WHERE (((dbo_Contact.Contact_Inactive) = False) AND ((dbo_Mailing_Subscribers.Mailing_ID) = True) AND ((dbo_Mailing_Subscribers.Subscribed) = True)); For example, if Kurt and William live at 123 A Street and Steve lives at 123 B Avenue, I would only want to return records for Kurt and Steve.
Okay, here is my solution. 1 - Use table aliases to reduce code foot print. 2 - Changed true/false to 1/0 to meet TSQL Boolean values. 3 - Create RowNum partition by address1, address2, zip code order by last name, first name. 4 - Select only one row from the duplicates. ; with cteData as ( SELECT C.Contact_Title AS Title, C.Contact_FirstName AS [First Name], C.Contact_LastName AS [Last Name], C.Contact_Suffix AS Suffix, C.Business_Name AS [Business Name], C.Contact_Address1 AS [Address 1], C.Contact_Address2 AS [Address 2], C.Contact_City AS City, C.Contact_State AS State, C.Contact_Zip AS Zip, C.Contact_Email AS Email, row_number() over (partition by Contact_Address1, Contact_Address2, Contact_Zip order by Contact_LastName, Contact_FirstName) as RowNum FROM dbo_Contact as C INNER JOIN dbo_Mailing_Subscribers as S ON C.[ContactID] = S.[ContactID] WHERE C.Contact_Inactive = 0 AND S.Mailing_ID = 1 AND S.Subscribed = 1 ) select * from cteData where RowNum = 1; In short, one mailing will go out to the person with the last / first name closets to A.
You can do use row_number() and a subquery: select <list of columns you want> from (SELECT row_number() over (partition by Contact_Address1, Contact_Address2, Contact_Zip order by ContactId ) as seqnum dbo_Contact.Contact_Title AS Title, dbo_Contact.Contact_FirstName AS [First Name], dbo_Contact.Contact_LastName AS [Last Name], dbo_Contact.Contact_Suffix AS Suffix, dbo_Contact.Business_Name AS [Business Name], dbo_Contact.Contact_Address1 AS [Address 1], dbo_Contact.Contact_Address2 AS [Address 2], dbo_Contact.Contact_City AS City, dbo_Contact.Contact_State AS State, dbo_Contact.Contact_Zip AS Zip, dbo_Contact.Contact_Email AS Email FROM dbo_Contact INNER JOIN dbo_Mailing_Subscribers ON dbo_Contact.[ContactID] = dbo_Mailing_Subscribers.[ContactID] WHERE (((dbo_Contact.Contact_Inactive) = False) AND ((dbo_Mailing_Subscribers.Mailing_ID) = True) AND ((dbo_Mailing_Subscribers.Subscribed) = True)) ) t where seqnum = 1; It is unclear what you mean by the "same address" and by "the first contact". For address, the query uses the two address lines and the zip code. FOr "first contact" the query uses the lowest id in the contact table.
Customers credit system, oracle SQL
I am an oversea student, so I am not familiar with "Credit System" but I have a database question which is related to it. I just could not understand it well. Here it the question: Write a query: The billing officer would like to know which customers are currently over their credit limit. The schema of database is: Sales_Rep (SLSRep_Number [pk], Last, First, Street, City, State, Post_Code, Total_Commission, Commission_Rate) Customer (Customer_Number [pk], Last, First, Street, City, State, Post_Code, Balance, Credit_Limit, SLSRep_Number [fk]) Orders (Order_Number [pk], Order_Date, Customer_Number [fk]) Part (Part_Number [pk], Part_Description, Units_on_Hand, Item_Class, Warehouse_Number, Unit_Price) Order_Line (Order_Number, [pk1] Part_Number [pk2], Number_Ordered, Quoted_Price) Any idea? Is that just : Select customer_number,last,first,balance,credit_limit from customer where balance > credit_limit; or might be: select * from (select mytable.customer_number,sum(mytable.number_ordered*mytable.quoted_price) as customer_cost from (select customer.customer_number,order_line.number_ordered,order_line.quoted_price from customer,orders,order_line where customer.customer_number = orders.customer_number and orders.order_number = order_line.order_number) mytable group by mytable.customer_number) mytable2,customer where customer.credit_limit < mytable2.customer_cost and customer.customer_number = mytable2.customer_number;
first query is right, It will give the customer who has balance beyond the credit limit.