How can I pivot dataset in Google BigQuery? - sql

I have a massive dataset with this schema:
Customer INTEGER
CategoryID INTEGER
CategoryName STRING
ProjectStage INTEGER
NextStepID INTEGER
NextStepName STRING
NextStepIsAnchor BOOLEAN
I heed to get the resulting set where each customer will be only on one row and his/her next steps will be in the columnts like this:
Customer | CategoryID | CategoryName | ProjectStage | NextStep1ID | NextStep1Name | NextStep2ID | NextStep2Name | ... etc.
I tried to play with NTH function of BigQuery but it works only for the first occurrence of the NextStepID:
SELECT
customer,
nth(1, NextStepID)
FROM [2015_05.customers_wunique_nextsteps]
group by customer
but when I try to add more columns:
SELECT
customer,
nth(1, NextStepID),
nth(2, NextStepID)
FROM [2015_05.customers_wunique_nextsteps]
group by customer
I get this error:
Error: Function 'NTH(2, [NextStepID])' cannot be used in a distributed
query, this function can only be correctly computed for queries that
run on a single node.
Any ideas?
Now I "pivot" the results with Excel and small VBA script, but when datasets grow bigger calculation time exceeds all limits...
Thanks in advance! :)

Function NTH is applicable to REPEATED fields, where it chooses the nth repeating element (the error message can be improved). So first step would be to build REPEATED field out of NextStepID, and it can be done with NEST aggregation function. Then you can use NTH as scoped aggregation function:
SELECT
Customer,
NTH(1, NextStepID) WITHIN RECORD AS NextStepID1,
NTH(2, NextStepID) WITHIN RECORD AS NextStepID2,
NTH(3, NextStepID) WITHIN RECORD AS NextStepID3
FROM (
SELECT Customer, NEST(NextStepID) AS NextStepID
FROM [2015_05.customers_wunique_nextsteps] GROUP BY Customer)

Related

SQL: CASE WHEN having AVG() as condition not giving right output

I have a table of unique users that each has a "rating" column (it's an average rating they give out of all their ratings given in a different table of reviews). I want to add another column to my table, which specifies either them giving a rating that is above the average of all ratings of all users (hence I use the AVG() function), below or at average (I call it "bias"). In other words, I want to see whether each user gives on average higher or lower ratings than the total average. I understand the limitedness of this query, and ideally I would include an interval (i.e. within 0.5 points below or above average still counts as average) but I can't seem to make even the simplest query work.
I've been using the Yelp dataset from a Coursera course, but I tried to create a sample that produces the same result that I do not want - just one row. I want to have this categorization for each row, hence it should return 3 rows in this example, "below average" in the first two and "above average" in the third. However, the code below produces just one row. I have been working with R and this seems like I am using incorrect syntax, but after 30 minutes of searching the web I cannot find a solution.
I am working in and want to use SQLite syntax as part of the course in Coursera
CREATE TABLE test
(
id integer primary key,
rating integer
);
INSERT INTO test
(id, rating)
VALUES
(1, 1);
INSERT INTO test
(id, rating)
VALUES
(2, 3);
INSERT INTO test
(id, rating)
VALUES
(3, 8);
SELECT id,
rating,
CASE
WHEN rating > AVG(rating) THEN "above average"
WHEN rating < AVG(rating) THEN "below average"
ELSE "no bias"
END AS "bias"
FROM test
You can't use the aggregate function AVG() like this.
But you can do it with AVG() window function:
SELECT id,
rating,
CASE
WHEN rating > AVG(rating) OVER () THEN "above average"
WHEN rating < AVG(rating) OVER () THEN "below average"
ELSE "no bias"
END AS "bias"
FROM test
See the demo.
Results:
| id | rating | bias |
| --- | ------ | ------------- |
| 1 | 1 | below average |
| 2 | 3 | below average |
| 3 | 8 | above average |
SELECT id,
rating,
CASE
WHEN rating > (select AVG(rating) from test) THEN "above average"
WHEN rating < (select AVG(rating) from test) THEN "below average"
ELSE "no bias"
END AS "bias"
FROM test
AVG is an aggregate function and works in conjunction with a GROUP BY.
when you do not specify anything in the GROUP BY section, it will aggregate the whole table thus reducing it to one row.
Generally you select aggregated columns and non-aggregated columns without specifying the non aggregated columns in the GROUP BY list. I am not a big fan of DBMS which allow this behavior (SQLLite seems to be an offender).
What I did in the query above is that I calculated the average of the whole table using a subquery. And then compared each row against the average.
Or like others have specified you can go with WINDOW functions. Where you apply a function over a some parts of the data as defined by your window. They look like theri regular aggregated functions conunterparts but you will notice the OVER keyword which specify they are applied over a window. In the over clause you can partition your data or you can use it as a whole. For example if you had multiple stores and sales amount per day for each store, you could parition by store to compute the per store average.

Access SQL: Union query that generates pop-up

Situation: I have three tables of parts: Raw Material, Individual Parts, and Assembled Parts. I have created a union query to list all the part numbers as well as their minimum levels of inventory and and opening levels of inventory. I also have an inventory table that uses all the part numbers. I this used the union query to find current inventory and a current balance in another query. When I attempt to open this query I get a input box asking for CurrentInventory.
Question: How do I get the input box to stop appearing?
Code:
Tables:
Raw Material, Individual Parts, and Assembled Parts all have similar formats that begin with the following
PartNum | Min | Open
1 50 100
Inventory:
PartNum | Year | Week | In | Out
1 2015 31 20 10
Queries
Union Query:
SELECT PartNum, Open, Min
FROM Raw Material
UNION
SELECT PartNum , Open, Min
FROM Individual Parts
UNION
SELECT PartNum, Open, Min
FROM Assembled Parts;
Which results in:
PartNum | Min | Open
1 50 100
etc.
Current Inventory:
SELECT AllParts.PartNum, AllParts.Open, Sum(Inventory.[In]) AS SumOfIn,
Sum(Inventory.Out) AS SumOfOut,
[Open]+[SumOfIn]-[SumOfOut] AS CurrentInventory,
AllParts.Min, [CurrentInventory]-[Min] AS CurrentBalance
FROM AllParts
INNER JOIN Inventory ON AllParts.PartNum = Inventory.PartNum
GROUP BY AllParts.PartNum, AllParts.Open, AllParts.Min,
[CurrentInventory]-[Min], [Open]+[In]-[Out];
When I attempt to run this is when I get the input box for CurrentInventory. If I don't enter anything it doesn't effect the results. However, when I attempt to run the report I generate from this, the column will show as what I entered and not the actual value.
Even though you are aliasing a calculated result as "CurrentInventory", you can't reference that calculation by the alias in the same query.
Everytime you have "CurrentInventory" (except for after the "AS") you need to replace it with [Open]+[SumOfIn]-[SumOfOut]

Get the sum of a column in oracle reports

I am trying to get the sum of a column using oracle reports, but with a condition. For example I have three columns, a store column, a fruit column and a cost column. I want to get the sum cost of all the "bananas", or whichever fruit you pick, bought in a particular store.
Ex:
store1------------banana----------------$5.00
store1------------apple-----------------$2.00
store1------------banana----------------$3.50
store 1 bananas = $8.50 <- this is what I want
store 1 sum = $10.50
store2------------apples----------------$1.50
etc
I've tried making a formula in the data model, but then I'd have to supply it with the store name.
You can use the SUM function, wich is a ANSI SQL function. You also need to use group by:
select store_name, fruit_name, sum(cost)as Total_Cost
from your_table
group by store_name, fruit_name

Adding a percent column to MS Access Query

I'm trying to add a column which calculates percentages of different products in MS Access Query. Basically, this is the structure of the query that I'm trying to reach:
Product |
Total |
Percentage
Prod1 |
15 |
21.13%
Prod2 |
23 |
32.39%
Prod3 |
33 |
46.48%
Product |
71 |
100%
The formula for finding the percent I use is: ([Total Q of a Product]/[Totals of all Products])*100, but when I try to use the expression builder (since my SQL skills are basic) in MS Access to calculate it..
= [CountOfProcuts] / Sum([CountOfProducts])
..I receive an error message "Cannot have aggregate function in GROUP BY clause.. (and the expression goes here)". I also tried the option with two queries: one that calculates only the totals and another that use the first one to calculate the percentages, but the result was the same.
I'll be grateful if someone can help me with this.
You can get all but the last row of your desired output with this query.
SELECT
y.Product,
y.Total,
Format((y.Total/sub.SumOfTotal),'#.##%') AS Percentage
FROM
YourTable AS y,
(
SELECT Sum(Total) AS SumOfTotal
FROM YourTable
) AS sub;
Since that query does not include a JOIN or WHERE condition, it returns a cross join between the table and the single row of the subquery.
If you need the last row from your question example, you can UNION the query with another which returns the fabricated row you want. In this example, I used a custom Dual table which is designed to always contain one and only one row. But you could substitute another table or query which returns a single row.
SELECT
y.Product,
y.Total,
Format((y.Total/sub.SumOfTotal),'#.##%') AS Percentage
FROM
YourTable AS y,
(
SELECT Sum(Total) AS SumOfTotal
FROM YourTable
) AS sub
UNION ALL
SELECT
'Product',
DSum('Total', 'YourTable'),
'100%'
FROM Dual;

How do I specify a default value in a MS Access query?

I have three tables similar to the following:
tblInvoices: Number | Date | Customer
tblInvDetails: Invoice | Quantity | Rate | Description
tblPayments: Invoice | Date | Amount
I have created a query called exInvDetails that adds an Amount column to tblInvDetails:
SELECT tblInvDetails.*, [tblInvDetails.Quantity]*[tblInvDetails.Rate]* AS Amount
FROM tblInvDetails;
I then created a query exInvoices to add Total and Balance columns to tblInvoices:
SELECT tblInvoices.*,
(SELECT Sum(exInvDetails.Amount) FROM exInvDetails WHERE exInvDetails.Invoice = tblInvoices.Number) AS Total,
(SELECT Sum(tblPayments.Amount) FROM tblPayments WHERE tblPayments.Invoice = tblInvoices.Number) AS Payments,
(Total-Payments) AS Balance
FROM tblInvoices;
If there are no corresponding payments in tblPayments, the fields are null instead of 0. Is there a way to force the resulting query to put a 0 in this column?
Use the nz() function, as in nz(colName, 0). This will return colName, unless it is null, in which case it will return the 2nd paramter (in this case, 0).
I think that Default Value likes Field location. For example, My Field is ID that is in tblEmpPCs table, I used a criteria below.
IIf([Forms]![frmSearchByEmp]![cboTurul]="ID",[forms]![frmSearchByEmp]![Ezemshigch],[tblEmpPCs].[ID])
There are False value is Default value of Criteria.
I uploaded a picture of my query criteria. It may help you.