how to shorten the runtime of a bigquery query

how to shorten the runtime of a bigquery query - sql

I have two tables:
1. PEOPLE (PK, Name, Address, Zip, <<some random other columns>>)
2. EMAIL (PK, Name, Address, Zip, Email)
This is a one-to-many table where they are linked by Name, Address, and Zip.
What I need is:
PEOPLE (PK, Name, Address, Zip, <<some random other columns>>, FK_Email1, Email1, FK_Email2, Email2, FK_Email3, Email3)
What I have so far is this:
#standardSQL
SELECT a.PK, a.FK, Source, FirstName, LastName, MiddleName, SuffixName, Gender, Age, DOB, Address, Address2, City, State, Zip, Zip4, Cleaned_HouseNumber, Cleaned_Street, Cleaned_City, Cleaned_County, Cleaned_State, Cleaned_Zip, TimeZone, Income, HomeValue, Networth, MaritalStatus, IsRenter, HasChildren, CreditRating, Investor, LinesOfCredit, InvestorRealEstate, Traveler, Pets, MailResponder, Charitable, PolicalDonations, PoliticalParty, ATTOM_ID, GEOID, SCORE, Latitude, Longitude, SpouseFirstName, SpouseLastName, HomeAvailableHomeEquity, HomeTotalLoans, HomeLoan1Amount, HomeLoan2Amount, HomeValueRangeCode, HomeValueRangeText, HomeMarketValue, HomeAssessedValue, HomeLoanToValue, HomeSQFT, HomeLotSQFT, HomeYearBuilt, HomePurchaseDate, HomeLoan1Date, HomeLoan2Date, HomeParcelNumber, HomePropertyType, DNC, HomeCompanyOwned, HomeTrustOwned, HomeOwnerOccupied, HomeType, HomePool, HomeGarage, HomeHeating, HomeCooling, HomeBedrooms, HomeBathrooms, HomeNumberOfUnits, MailingAddress, MailingCity, MailingState, MailingZip, MailingZip4, Married, Divorce, Education, Occupation, Ethnicity, LANGUAGE, RELIGION,
FK_Email[SAFE_ORDINAL(1)] FK_Email1, Emails[SAFE_ORDINAL(1)] Email1, FK_Email[SAFE_ORDINAL(2)] as FK_Email2, Emails[SAFE_ORDINAL(2)] Email2, FK_Email[SAFE_ORDINAL(3)] as FK_Email3, Emails[SAFE_ORDINAL(3)] Email3
FROM (
SELECT
P.PK, P.FK, P.Source, P.FirstName, P.LastName, MiddleName, SuffixName, Gender, Age, DOB, P.Address, Address2, P.City, P.State, P.Zip, Zip4, Cleaned_HouseNumber, Cleaned_Street, Cleaned_City, Cleaned_County, Cleaned_State, Cleaned_Zip, TimeZone, Income, HomeValue, Networth, MaritalStatus, IsRenter, HasChildren, CreditRating, Investor, LinesOfCredit, InvestorRealEstate, Traveler, Pets, MailResponder, Charitable, PolicalDonations, PoliticalParty, ATTOM_ID, GEOID, SCORE, Latitude, Longitude, SpouseFirstName, SpouseLastName, HomeAvailableHomeEquity, HomeTotalLoans, HomeLoan1Amount, HomeLoan2Amount, HomeValueRangeCode, HomeValueRangeText, HomeMarketValue, HomeAssessedValue, HomeLoanToValue, HomeSQFT, HomeLotSQFT, HomeYearBuilt, HomePurchaseDate, HomeLoan1Date, HomeLoan2Date, HomeParcelNumber, HomePropertyType, DNC, HomeCompanyOwned, HomeTrustOwned, HomeOwnerOccupied, HomeType, HomePool, HomeGarage, HomeHeating, HomeCooling, HomeBedrooms, HomeBathrooms, HomeNumberOfUnits, MailingAddress, MailingCity, MailingState, MailingZip, MailingZip4, Married, Divorce, Education, Occupation, Ethnicity, LANGUAGE, RELIGION
, ARRAY_AGG(E.Email) Emails, ARRAY_AGG(E.PK) FK_Email
FROM `db.ds.table1` P
left JOIN `db.ds.table2` E
ON P.FirstName = E.FirstName
AND P.LastName = E.LastName
AND P.Address = E.Address
AND P.Zip = E.Zip
Group by 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87
) a
my problem is this already goes past the six out time limit.
Is there anyway to make this run faster?
Thanks!

I feel below does the same but in more optimized way
#standardSQL
SELECT
PK, FK, Source, P.FirstName, P.LastName, MiddleName, SuffixName, Gender, Age, DOB, P.Address, Address2, City, State, P.Zip, Zip4, Cleaned_HouseNumber, Cleaned_Street, Cleaned_City, Cleaned_County, Cleaned_State, Cleaned_Zip, TimeZone, Income, HomeValue, Networth, MaritalStatus, IsRenter, HasChildren, CreditRating, Investor, LinesOfCredit, InvestorRealEstate, Traveler, Pets, MailResponder, Charitable, PolicalDonations, PoliticalParty, ATTOM_ID, GEOID, SCORE, Latitude, Longitude, SpouseFirstName, SpouseLastName, HomeAvailableHomeEquity, HomeTotalLoans, HomeLoan1Amount, HomeLoan2Amount, HomeValueRangeCode, HomeValueRangeText, HomeMarketValue, HomeAssessedValue, HomeLoanToValue, HomeSQFT, HomeLotSQFT, HomeYearBuilt, HomePurchaseDate, HomeLoan1Date, HomeLoan2Date, HomeParcelNumber, HomePropertyType, DNC, HomeCompanyOwned, HomeTrustOwned, HomeOwnerOccupied, HomeType, HomePool, HomeGarage, HomeHeating, HomeCooling, HomeBedrooms, HomeBathrooms, HomeNumberOfUnits, MailingAddress, MailingCity, MailingState, MailingZip, MailingZip4, Married, Divorce, Education, Occupation, Ethnicity, LANGUAGE, RELIGION,
FK_Email[SAFE_ORDINAL(1)] FK_Email1, Emails[SAFE_ORDINAL(1)] Email1, FK_Email[SAFE_ORDINAL(2)] AS FK_Email2, Emails[SAFE_ORDINAL(2)] Email2, FK_Email[SAFE_ORDINAL(3)] AS FK_Email3, Emails[SAFE_ORDINAL(3)] Email3
FROM `db.ds.table1` P
LEFT JOIN (
SELECT FirstName, LastName, Address, Zip,
ARRAY_AGG(Email LIMIT 3) Emails, ARRAY_AGG(PK LIMIT 3) FK_Email
FROM `db.ds.table2`
GROUP BY FirstName, LastName, Address, Zip
) E
ON P.FirstName = E.FirstName
AND P.LastName = E.LastName
AND P.Address = E.Address
AND P.Zip = E.Zip

Related

Displaying the Characters Represented with an Integer in SQL

I'm trying to generate a report where it lists the race, ethnicity and county of a client. Right now, the report displays 1,2,3, etc. for the race that has been selected as opposed to Black, White, Other, etc. This is also the case for ethnicity and county. I'm wondering what is the proper way to display the letter description as opposed to the integer for the race, ethnicity, and county. I've included my query below, any assistance is appreciated. Thank you.
SELECT
person.idFamily AS Family_ID,
person.id AS Person_ID,
(SELECT person.firstName+ ', ' + person.lastName) AS Name,
person.Race AS Race,
person.Ethnicity AS Ethnicity,
family.capidCounty AS County,
person.birthDate AS BirthDate,
DATEDIFF(year,person.birthDate,getdate()) as Age
FROM Family
LEFT JOIN person ON family.Id = person.idFamily

Why Is my code being ignored?

When I execute the only thing that is working is the Zipcodes. It is not giving me a service date greater than 08/01/16 or my Billing Providers. I am not sure why.
SELECT Name, BirthDate, Address1, Address2, City, StateProvince, ZipCode, BillingProviderID, ServiceDate, ChargeCatID
FROM dbo.PbrChargeTransactions
WHERE ChargeCatID = 'EM'
AND ServiceDate >= '08/01/16'
AND BillingProviderID IN ('AAD.FD','DSD.DFD','ASDF.DD')
AND ZipCode Like '68730%'
OR ZipCode Like '68792%'
OR ZipCode Like '68739%'
OR ZipCode Like '68718%'
OR ZipCode Like '57069%'
OR ZipCode Like '57031%'
OR ZipCode Like '57078%'
OR ZipCode Like '57066%'
OR ZipCode Like '57063%'
OR ZipCode Like '57037%'
OR ZipCode Like '57073%'
OR ZipCode Like '57029%'
OR ZipCode Like '57070%'
ORDER BY Name

Operator precedence: "And" binds more tightly than "or". That is, only the first zip-code comparison is evaluated with the rest of your "and" conditions. The rest are 'OR's and if they match, you get the record.
Try this:
SELECT Name, BirthDate, Address1, Address2, City, StateProvince, ZipCode, BillingProviderID, ServiceDate, ChargeCatID
FROM dbo.PbrChargeTransactions
WHERE ChargeCatID = 'EM'
AND ServiceDate >= '08/01/16'
AND BillingProviderID IN ('AAD.FD','DSD.DFD','ASDF.DD')
AND (ZipCode Like '68730%'
OR ZipCode Like '68792%'
OR ZipCode Like '68739%'
OR ZipCode Like '68718%'
OR ZipCode Like '57069%'
OR ZipCode Like '57031%'
OR ZipCode Like '57078%'
OR ZipCode Like '57066%'
OR ZipCode Like '57063%'
OR ZipCode Like '57037%'
OR ZipCode Like '57073%'
OR ZipCode Like '57029%'
OR ZipCode Like '57070%'
)
ORDER BY Name

You can use (...) as n8wrl shows or the following, which I think will be faster.
SELECT Name, BirthDate, Address1, Address2, City, StateProvince, ZipCode, BillingProviderID, ServiceDate, ChargeCatID
FROM dbo.PbrChargeTransactions
WHERE ChargeCatID = 'EM'
AND ServiceDate >= '08/01/16'
AND BillingProviderID IN ('AAD.FD','DSD.DFD','ASDF.DD')
AND LEFT(ZipCode,5) in ('68730','68792','68739','68718','57069','57031', '57078','57066','57063','57037','57073','57029','57070')
ORDER BY Name

The issue might be because you missed some parenthesis. Please be aware of the precedence of operators
If you write
A AND B AND zip1 OR zip2 OR zip3
this is equals to
((A AND B) AND zip1) OR zip2 OR zip3
I think you intended to write
A AND B AND (zip1 OR zip2 OR zip3)
where A, B, zip1,... are boolean conditions like zip LIKE '12345%'

How to write the following Queries in SQL

My tables:
Musician (id, mname, aname, percentage(fee for the agemt))
Singer (id, gender)
Instrumentalist (id, instrument)
Agent (aname, street, city, zip)
Festival (title, place, sdate, edate)
Event (title, edate, etime)
Booked (id, title (Event.title), edate, etime, salary)
for each festival, find the agent , its overall profit this festival is the highest. Presented the festival's name, agent's name, and its overall profit.
Festival.title="Autumn Festival" and salary<8000,
find the date and time of the event, that taking part in only
instrumentalists, and the number of instrumentalists at this event is the lowest.
1:
select f1.title , aname , sum (salary*percentage/100) as feeAgent
from festival as f1 , booked , musician
where booked.id=musician.id and booked.edate between f1.sdate and f1.edate
group by f1.title , aname
having sum (salary*percentage/100)>any(select sum (salary*percentage/100)
from festival as f1 , booked , musician
where booked.id=musician.id and booked.edate between f1.sdate and f1.edate
group by musician.aname)
2:
select event.edate, event.etime
from event, festival as f1, booked natural join musician natural join instrumentalist
where event.title=f1.title and f1.title Like 'spring%' and booked.salary<8000 and booked.edate between f1.sdate and f1.edate
group by event.edate, event.etime
having count(booked.title)<all(select count(*)
from event,festival as f1,booked natural join musician natural join instrumentalist
where event.title=f1.title and f1.title Like 'spring%' and booked.salary<8000 and booked.edate between f1.sdate and f1.edate
group by booked.title)

Select Only one row where Address Matches

Using SQL Server, I have a Contact table that contains the name and address of my contacts. I have several instances of multiple contacts living at the same address. When sending my newsletter, I only want to send it to each address one time. How can I modify the query below to only display the first contact at each address?
SELECT
dbo_Contact.Contact_Title AS Title,
dbo_Contact.Contact_FirstName AS [First Name],
dbo_Contact.Contact_LastName AS [Last Name],
dbo_Contact.Contact_Suffix AS Suffix,
dbo_Contact.Business_Name AS [Business Name],
dbo_Contact.Contact_Address1 AS [Address 1],
dbo_Contact.Contact_Address2 AS [Address 2],
dbo_Contact.Contact_City AS City,
dbo_Contact.Contact_State AS State,
dbo_Contact.Contact_Zip AS Zip,
dbo_Contact.Contact_Email AS Email
FROM
dbo_Contact
INNER JOIN
dbo_Mailing_Subscribers ON dbo_Contact.[ContactID] = dbo_Mailing_Subscribers.[ContactID]
WHERE
(((dbo_Contact.Contact_Inactive) = False)
AND ((dbo_Mailing_Subscribers.Mailing_ID) = True)
AND ((dbo_Mailing_Subscribers.Subscribed) = True));
For example, if Kurt and William live at 123 A Street and Steve lives at 123 B Avenue, I would only want to return records for Kurt and Steve.

Okay, here is my solution.
1 - Use table aliases to reduce code foot print.
2 - Changed true/false to 1/0 to meet TSQL Boolean values.
3 - Create RowNum partition by address1, address2, zip code order by last name, first name.
4 - Select only one row from the duplicates.
;
with cteData as
(
SELECT
C.Contact_Title AS Title,
C.Contact_FirstName AS [First Name],
C.Contact_LastName AS [Last Name],
C.Contact_Suffix AS Suffix,
C.Business_Name AS [Business Name],
C.Contact_Address1 AS [Address 1],
C.Contact_Address2 AS [Address 2],
C.Contact_City AS City,
C.Contact_State AS State,
C.Contact_Zip AS Zip,
C.Contact_Email AS Email,
row_number() over (partition by Contact_Address1, Contact_Address2, Contact_Zip
order by Contact_LastName, Contact_FirstName) as RowNum
FROM
dbo_Contact as C
INNER JOIN
dbo_Mailing_Subscribers as S
ON C.[ContactID] = S.[ContactID]
WHERE
C.Contact_Inactive = 0 AND
S.Mailing_ID = 1 AND
S.Subscribed = 1
)
select * from cteData where RowNum = 1;
In short, one mailing will go out to the person with the last / first name closets to A.

You can do use row_number() and a subquery:
select <list of columns you want>
from (SELECT row_number() over (partition by Contact_Address1, Contact_Address2, Contact_Zip
order by ContactId
) as seqnum
dbo_Contact.Contact_Title AS Title,
dbo_Contact.Contact_FirstName AS [First Name],
dbo_Contact.Contact_LastName AS [Last Name],
dbo_Contact.Contact_Suffix AS Suffix,
dbo_Contact.Business_Name AS [Business Name],
dbo_Contact.Contact_Address1 AS [Address 1],
dbo_Contact.Contact_Address2 AS [Address 2],
dbo_Contact.Contact_City AS City,
dbo_Contact.Contact_State AS State,
dbo_Contact.Contact_Zip AS Zip,
dbo_Contact.Contact_Email AS Email
FROM dbo_Contact
INNER JOIN
dbo_Mailing_Subscribers ON dbo_Contact.[ContactID] = dbo_Mailing_Subscribers.[ContactID]
WHERE
(((dbo_Contact.Contact_Inactive) = False)
AND ((dbo_Mailing_Subscribers.Mailing_ID) = True)
AND ((dbo_Mailing_Subscribers.Subscribed) = True))
) t
where seqnum = 1;
It is unclear what you mean by the "same address" and by "the first contact". For address, the query uses the two address lines and the zip code. FOr "first contact" the query uses the lowest id in the contact table.

Customers credit system, oracle SQL

I am an oversea student, so I am not familiar with "Credit System" but I have a database question which is related to it. I just could not understand it well.
Here it the question:
Write a query:
The billing officer would like to know which customers are currently over their credit limit.
The schema of database is:
Sales_Rep (SLSRep_Number [pk], Last, First, Street, City, State, Post_Code,
Total_Commission, Commission_Rate)
Customer (Customer_Number [pk], Last, First, Street, City, State, Post_Code,
Balance, Credit_Limit, SLSRep_Number [fk])
Orders (Order_Number [pk], Order_Date, Customer_Number [fk])
Part (Part_Number [pk], Part_Description, Units_on_Hand, Item_Class, Warehouse_Number, Unit_Price)
Order_Line (Order_Number, [pk1] Part_Number [pk2], Number_Ordered, Quoted_Price)
Any idea?
Is that just :
Select customer_number,last,first,balance,credit_limit
from customer
where balance > credit_limit;
or might be:
select * from
(select mytable.customer_number,sum(mytable.number_ordered*mytable.quoted_price) as customer_cost from
(select customer.customer_number,order_line.number_ordered,order_line.quoted_price
from customer,orders,order_line
where customer.customer_number = orders.customer_number
and orders.order_number = order_line.order_number) mytable
group by mytable.customer_number) mytable2,customer
where customer.credit_limit < mytable2.customer_cost
and customer.customer_number = mytable2.customer_number;

first query is right, It will give the customer who has balance beyond the credit limit.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

how to shorten the runtime of a bigquery query - sql

Related

Displaying the Characters Represented with an Integer in SQL

Why Is my code being ignored?

How to write the following Queries in SQL

Select Only one row where Address Matches

Customers credit system, oracle SQL

Categories

Resources