BigQuery - Adwords Data Transfer - AccountStats vs AccountBasicStats

BigQuery - Adwords Data Transfer - AccountStats vs AccountBasicStats - google-bigquery

For many tables, there's always a AccountStats vs AccountBasicStats.
The same SQL query might have different values from Stats vs BasicStats, for example:
SELECT
cs.Date,
SUM(cs.Impressions) AS Sum_Impressions,
SUM(cs.Clicks) AS Sum_Clicks,
SUM(cs.Interactions) AS Sum_Interactions,
(SUM(cs.Cost) / 1000000) AS Sum_Cost,
SUM(cs.Conversions) AS Sum_Conversions
FROM
`{dataset_id}.Customer_{customer_id}` c
LEFT JOIN
`{dataset_id}.AccountBasicStats_{customer_id}` cs
<-----OR USING----->
`{dataset_id}.AccountStats_{customer_id}` cs
ON
c.ExternalCustomerId = cs.ExternalCustomerId
WHERE
c._DATA_DATE = c._LATEST_DATE
AND c.ExternalCustomerId = {customer_id}
GROUP BY
1
ORDER BY
1
It seems the main difference is ClickType column, which might double count based on the documentation: ClickType.
The BasicStats seems the most accurate, and match up exactly from adwords. While the Stats give around 2x-3x increase in impressions.
Is there a way to transform the data so that both queries would get the same results?
Since there's no basic stats for Hourly data, which I'm interested.

According to:
https://groups.google.com/forum/#!topic/adwords-api/QiY_RT9aNlM
Seems that there is no way to de-segment the data after ClickType is brought in.

Related

Add Math function to SQL Query

New to SQL and I am trying to run a query that pulls all our item codes, lot number, and qty on hand.
Each lot number has multiple entries due to adjustments. I need a way of running my query and having it add or subtract to get the actual qty on hand for each lot and only show me lots that are in the negatives. I have tried playing with SSRS but I cant get it right. I'm using SQL 2008R2.
SELECT
IMLAYER.ITEM_CODE
,IMMSTR.ITEM_DESC
,IMLAYER.LOT_NO
,IMLAYER.QTY_ON_HAND
FROM
IMLAYER
INNER JOIN
IMMSTR
ON
IMLAYER.ITEM_CODE = IMMSTR.ITEM_CODE
WHERE
(IMLAYER.QTY_ON_HAND < 0);

I believe I understand the requirements correctly, but if not please comment and I can update the query:
SELECT
M.ITEM_CODE
,M.ITEM_DESC
,L.LOT_NO
,'SUM_OF_QTY_ON_HAND' = SUM(L.QTY_ON_HAND)
FROM
IMLAYER L
INNER JOIN
IMMSTR M
ON L.ITEM_CODE = M.ITEM_CODE
GROUP BY
M.ITEM_CODE
,M.ITEM_DESC
,L.LOT_NO
HAVING
SUM(L.QTY_ON_HAND) < 0
HAVING is the trick you are looking for to be able to use an aggregate function for filtering.

Defaulting missing data

I have a complex set of schema that I am trying to pull data out of for a report. The query for it joins a bunch of tables together and I am specifically looking to pull a subset of data where everything for it might be null. The original relations for the tables look as such.
Location.DeptFK
Dept.PK
Section.DeptFK
Subsection.SectionFK
Question.SubsectionFK
Answer.QuestionFK, SubmissionFK
Submission.PK, LocationFK
From here my problems begin to compound a little.
SELECT Section.StepNumber + '-' + Question.QuestionNumber AS QuestionNumberVar,
Question.Question,
Subsection.Name AS Subsection,
Section.Name AS Section,
SUM(CASE WHEN (Answer.Answer = 0) THEN 1 ELSE 0 END) AS NA,
SUM(CASE WHEN (Answer.Answer = 1) THEN 1 ELSE 0 END) AS AnsNo,
SUM(CASE WHEN (Answer.Answer = 2) THEN 1 ELSE 0 END) AS AnsYes,
(select count(distinct Location.Abbreviation) from Department inner join Plant on location.DepartmentFK = Department.PK WHERE(Department.Name = 'insertParameter'))
as total
FROM Department inner join
section on Department.PK = section.DepartmentFK inner JOIN
subsection on Subsection.SectionFK = Section.PK INNER JOIN
question on Question.SubsectionFK = Subsection.PK INNER JOIN
Answer on Answer.QuestionFK = question.PK inner JOIN
Submission on Submission.PK = Answer.SubmissionFK inner join
Location on Location.DepartmentFK = Department.PK AND Location.pk = Submission.PlantFK
WHERE (Department.Name = 'InsertParameter') AND (Submission.MonthTested = '1/1/2017')
GROUP BY Question.Question, QuestionNumberVar, Subsection.Name, Section.Name, Section.StepNumber
ORDER BY QuestionNumberVar;
There are 15 total locations, with this query I get 12. If I remove a relation in the join for Location I get 15 total locations but my answer data gets multiplied by 15. My issue is that not all locations are required to test at the same time so their answers should default to NA, They don't get records placed in the DB so the relationship between Location/Submission is absent.
I have a workaround almost in place via the select count distinct but, The second part is a query for finding what each location answered instead of a sum which brings the problem right back around. It also has to be dynamic because the input parameters for a department won't bring a static number of locations back each time.
I am still learning my SQL so any additional material to look at for building this query would also be appreciated. So I guess the big question here is, How would I go about creating default data in this query for anytime the Location/Submission relation has a null value?
Edit: Dummy Data
QuestionNumberVar | Section | Subsection | Question | AnsYes | AnsNo | NA (expected)
1-1.1 Math Algebra Did you do your homework? 10 1 1(4)
1-1.2 Math Algebra Did your dog eat it? 9 3 0(3)
2-1.1 English Greek Did you do your homework? 8 0 4(7)
I have tried making left joins at various applicable portions of the code to no avail. All attempts at left joins have ended with no effect on info output. This query feeds into the Dataset for an SSRS report. There are a couple workarounds for this particular section via an expression to take total Locations and subtract AnsYes and AnsNo to get the true NA value but as explained above doesn't help with my next query.
Edit: SQL Server 2012 for those who asked
Edit: my attempt at an isnull() on the missing data returns nothing I suspect because the query already eliminates the "null/missing" data. Left joining while doing this has also failed. The point of failure is on Submissions. if we bind it to Locations there are locations missing but if we don't bind it there are multiplied duplicates because Department has a One-To-Many with Location and not vice versa. I am unable to make any schema changes to improve this process.
There is a previous report that I am trying to emulate/update. It used C# logic to process data and run multiple queries to attain the same data. I don't have this luxury. (previous report exports to excel directly instead of SSRS). Here is the previous logic used.
select PK from Department where Name = 'InsertParameter';
select PK from Submission where LocationFK = 'Location.PK_var' and MonthTested = '1/1/2017'
Then it runs those into a loop where it processes nulls into NA using C# logic
EDIT (Mediocre Solution): I ended up doing the workaround of making a calculated field that subtracts Yes and No from the total # of Locations that have that Dept. This is a mediocre solution because I didn't solve my original problem and made 3 datasets that should have been displayed as a singular dataset. One for question info, one for each locations answer and one for locations that didnt participate. If a true answer comes up I will check its validity but for now, Problem psuedo solved.

Improvement in SQL Query - Access - Join / IN / Exists

This is my SQL Query - using in Access. It is providing the desired result.
But just wanted opinion whether the approach is correct.
How can this be speeded up.
SELECT INVDETAILS2.F5
, INVDETAILS2.F16
, ExpectedResult.DLID
, ExpectedResult.NumRows
FROM INVDETAILS2
INNER
JOIN (INVDL INNER JOIN ExpectedResult ON INVDL.DLID =ExpectedResult.DLID)
ON (INVDETAILS2.F14 = ROUND(ExpectedResult.Total))
AND (INVDETAILS2.F1 = INVDL.RegionCode)
WHERE INVDETAILS2.F29 ='2013-03-06'
AND INVDETAILS2.F5 IN (SELECT INVDETAILS2.F5
FROM (ExpectedResult
INNER JOIN INVDL
ON ExpectedResult.DLID = INVDL.DLID)
INNER JOIN INVDETAILS2
ON INVDL.RegionCode = INVDETAILS2.F1
AND round(ExpectedResult.Total)
= INVDETAILS2.F14
WHERE INVDETAILS2.F29='2013-03-06'
GROUP BY INVDETAILS2.F5
HAVING Count(ExpectedResult.DLID)<2
)
;
Approximate Number of Rows in
"ExpectedResult" - Millions
"INVDL" - 80,000
"INVDETAILS" - 300,000 - Total , For One Date - approx - 10,000 , then again lesser for each region per date.
Please provide a better query if possible.

Two things you could investigate that might help speed things up:
Indexing
Make sure that you have indexed all of the columns involved in JOINs, WHERE clauses, and GROUP BY clauses.
JOIN expressions involving functions
A couple of your JOINs use Round(ExpectedResult.Total), so if you have an index on ExpectedResult.Total your query won't be able to use it. You may get a performance boost if you add a RoundedTotal column (Long Integer, Indexed), populate it with
UPDATE [ExpectedResult] SET [RoundedTotal]=Round([Total])
and then use the RoundedTotal column in your JOINs.

Complicated Calculation Using Oracle SQL

I have created a database for an imaginary solicitors, my last query to complete is driving me insane. I need to work out the total a solicitor has made in their career with the company, I have time_spent and rate to multiply and special rate to add. (special rate is a one off charge for corporate contracts so not many cases have them). the best I could come up with is the code below. It does what I want but only displays the solicitors working on a case with a special rate applied to it.
I essentially want it to display the result of the query in a table even if the special rate is NULL.
I have ordered the table to show the highest amount first so i can use ROWNUM to only show the top 10% earners.
CREATE VIEW rich_solicitors AS
SELECT notes.time_spent * rate.rate_amnt + special_rate.s_rate_amnt AS solicitor_made,
notes.case_id
FROM notes,
rate,
solicitor_rate,
solicitor,
case,
contract,
special_rate
WHERE notes.solicitor_id = solicitor.solicitor_id
AND solicitor.solicitor_id = solicitor_rate.solicitor_id
AND solicitor_rate.rate_id = rate.rate_id
AND notes.case_id = case.case_id
AND case.contract_id = contract.contract_id
AND contract.contract_id = special_rate.contract_id
ORDER BY -solicitor_made;
Query:
SELECT *
FROM rich_solicitors
WHERE ROWNUM <= (SELECT COUNT(*)/10
FROM rich_solicitors)

I'm suspicious of your use of ROWNUM in your example query...
Oracle9i+ supports analytic functions, like ROW_NUMBER and NTILE, to make queries like your example easier. Analytics are also ANSI, so the syntax is consistent when implemented (IE: Not on MySQL or SQLite). I re-wrote your query as:
SELECT x.*
FROM (SELECT n.time_spent * r.rate_amnt + COALESCE(spr.s_rate_amnt, 0) AS solicitor_made,
n.case_id,
NTILE(10) OVER (ORDER BY solicitor_made) AS rank
FROM NOTES n
JOIN SOLICITOR s ON s.solicitor_id = n.solicitor_id
JOIN SOLICITOR_RATE sr ON sr.solicitor_id = s.solicitor_id
JOIN RATE r ON r.rate_id = sr.rate_id
JOIN CASE c ON c.case_id = n.case_id
JOIN CONTRACT cntrct ON cntrct.contract_id = c.contract_id
LEFT JOIN SPECIAL_RATE spr ON spr.contract_id = cntrct.contract_id) x
WHERE x.rank = 1
If you're new to SQL, I recommend using ANSI-92 syntax. Your example uses ANSI-89, which doesn't support OUTER JOINs and is considered deprecated. I used a LEFT OUTER JOIN against the SPECIAL_RATE table because not all jobs are likely to have a special rate attached to them.
It's also not recommended to include an ORDER BY in views, because views encapsulate the query -- no one will know what the default ordering is, and will likely include their own (waste of resources potentially).

you need to left join in the special rate.
If I recall the oracle syntax is like:
AND contract.contract_id = special_rate.contract_id (+)
but now special_rate.* can be null so:
+ special_rate.s_rate_amnt
will need to be:
+ coalesce(special_rate.s_rate_amnt,0)

SUM(a*b) not working

I have a PHP page running in postgres. I have 3 tables - workorders, wo_parts and part2vendor. I am trying to multiply 2 table column row datas together, ie wo_parts has a field called qty and part2vendor has a field called cost. These 2 are joined by wo_parts.pn and part2vendor.pn. I have created a query like this:
$scoreCostQuery = "SELECT SUM(part2vendor.cost*wo_parts.qty) as total_score
FROM part2vendor
INNER JOIN wo_parts
ON (wo_parts.pn=part2vendor.pn)
WHERE workorder=$workorder";
But if I add the costs of the parts multiplied by the qauntities supplied, it adds to a different number than what the script is doing. Help....I am new to this but if someone can show me in SQL I can modify it for postgres. Thanks

Without seeing example data, there's no way for us to know why you're query totals are coming out differently that when you do the math by hand. It could be a bad join, so you are getting more/less records than you expected. It's also possible that your calculations are off. Pick an example with the smallest number of associated records & compare.
My suggestion is to add a GROUP BY to the query:
SELECT SUM(p.cost * wp.qty) as total_score
FROM part2vendor p
JOIN wo_parts wp ON wp.pn = p.pn
WHERE workorder = $workorder
GROUP BY workorder
FYI: MySQL was designed to allow flexibility in the GROUP BY, while no other db I've used does - it's a source of numerous questions on SO "why does this work in MySQL when it doesn't work on db x...".
To Check that your Quantities are correct:
SELECT wp.qty,
p.cost
FROM WO_PARTS wp
JOIN PART2VENDOR p ON p.pn = wp.pn
WHERE p.workorder = $workorder
Check that the numbers are correct for a given order.

You could try a sub-query instead.
(Note, I don't have a Postgres installation to test this on so consider this more like pseudo code than a working example... It does work in MySQL tho)
SELECT
SUM(p.`score`) AS 'total_score'
FROM part2vendor AS p2v
INNER JOIN (
SELECT pn, cost * qty AS `score`
FROM wo_parts
) AS p
ON p.pn = p2v.pn
WHERE p2n.workorder=$workorder"

In the question, you say the cost column is in part2vendor, but in the query you reference wo_parts.cost. If the wo_parts table has its own cost column, that's the source of the problem.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas