Bigquery how to dynamically set a query - google-bigquery

In the below query, I am getting productname, count of each item in the product and count of each customer buying the product and then calculating the percentage of customers buying the product. I am hard coding the value of the total unique customers. I want to know how i can dynamically incorporate this in my query. Joining based on purchase date is the only solution that comes to my mind. is there any other effeciante way to achive this?
Query below
(SELECT ProdName, COUNT(ProdName) AS No_of_Prods,
EXACT_COUNT_DISTINCT(cust) as No_of_cust,
(EXACT_COUNT_DISTINCT(cust)/1500)*100 as Percentage_of_cust
FROM
[Prod-cust]
WHERE
(STRFTIME_UTC_USEC(Timestamp,"%Y%m%d")) = (STRFTIME_UTC_USEC(DATE_ADD(CURRENT_TIMESTAMP(), -1, "day"), "%Y%m%d"))
GROUP BY
1,
ORDER BY
2 DESC)
Query for total unique customer as below
(SELECT EXACT_COUNT_DISTINCT(cust),
FROM
[Prod-cust]
WHERE
(STRFTIME_UTC_USEC(Timestamp,"%Y%m%d")) = (STRFTIME_UTC_USEC(DATE_ADD(CURRENT_TIMESTAMP(), -1, "day"), "%Y%m%d"))

SELECT
one.ProdName AS ProdName,
one.No_of_Prods AS No_of_Prods,
one.No_of_cust AS No_of_cust,
(one.No_of_cust/all.No_of_cust)*100 AS Percentage_of_cust
FROM (
SELECT
ProdName,
COUNT(ProdName) AS No_of_Prods,
EXACT_COUNT_DISTINCT(cust) AS No_of_cust
FROM [Prod-cust]
WHERE
(STRFTIME_UTC_USEC(TIMESTAMP,"%Y%m%d")) = (STRFTIME_UTC_USEC(DATE_ADD(CURRENT_TIMESTAMP(), -1, "day"), "%Y%m%d"))
GROUP BY 1
) AS one
CROSS JOIN (
SELECT EXACT_COUNT_DISTINCT(cust) AS No_of_cust,
FROM [Prod-cust]
WHERE
(STRFTIME_UTC_USEC(TIMESTAMP,"%Y%m%d")) = (STRFTIME_UTC_USEC(DATE_ADD(CURRENT_TIMESTAMP(), -1, "day"), "%Y%m%d")
) AS all

SELECT number, RATIO_TO_REPORT(number) OVER() ratio
FROM (SELECT 10 number),(SELECT 40 number),(SELECT 70 number),(SELECT 20 number),(SELECT 30 number)
RATIO_TO_REPORT will add all the numbers, and give you the ratio number/(sum number).
Updated: If we need to count the number of DISTINCT customers, then a COUNT_DISTINCT() OVER() would work as a sub-query:
SELECT number, number/distincts ratio
FROM (
SELECT number, COUNT_DISTINCT(id) OVER() distincts
FROM (SELECT 10 number, 1 id),(SELECT 30 number, 2 id),
(SELECT 60 number, 3 id),(SELECT 20 number, 1 id),
(SELECT 40 number, 1 id)
)

Related

How to get a fraction of counters of subquery from different subqueries in one select?

I have a table with reviews for products. I want to sort product_ids that have more than 100 verified reviews(verified review is a review with verified_purshace=True) by the fraction of 5 star-reviews to all reviews. I tried to implement this in one select, but after numerous tries, I finish with the need to create views. I managed to write a query that counts a number of 5-star reviews, but can`t do better. Can anybody give me a hint?
My best query:
select *,count(*)
from (
select *
from reviews
where star_rating = 5
) low_reviews
left join (
select distinct filtered_reviews.product_id
from (
select *
from (
select verified_reviews.product_id, count(*) as verified_reviews_number
from (
select *
from reviews
where verified_purchase=True
) as verified_reviews
) as counted_verified_reviews
where counted_verified_reviews.verified_reviews_number > 100
) as filtered_reviews
) filtered_product_ids on low_reviews.product_id = filtered_product_ids.product_id;
Data example:
review_id customer_id product_id star_rating helpful_votes total_votes vine verified_purshase review_headline review_body review_date
14830128 R158AS05ZMH7VQ 0615349439 5 2 2 N false Planting a Church ... Witnessing To Dracula... 2011-02-14
I want to sort product_ids that have more than 100 verified reviews(verified review is a review with verified_purshace=True) by the fraction of 5 star-reviews to all reviews.
You don't provide sample data, but I would expect a query like this:
select product_id
from reviews
where verified_purchase
group by product_id
having count(*) > 100
order by avg( (review = 5)::int ) desc;
The expression avg( (review = 5)::int ) is a shorthand way of saying count(*) filter (where review = 5) * 1.0 / count(review). It works because it converts the expression review = 5 to an int, which is 1 for true and 0 for false. The average is the proportion of times when it is true.
Actually, the above assumes that you only care about review start ratings for verified purchases. If you want to include all reviews (even non-verified ones) for the ordering:
select product_id
from reviews
group by product_id
having count(*) filter (where verified_purchase) > 100
order by avg( (review = 5)::int ) desc;

Find duplicates in MS SQL table

I know that this question has been asked several times but I still cannot figure out why my query is returning values which are not duplicates. I want my query to return only the records which have identical value in the column Credit. The query executes without any errors but values which are not duplicated are also being returned. This is my query:
Select
_bvGLTransactionsFull.AccountDesc,
_bvGLAccountsFinancial.Description,
_bvGLTransactionsFull.TxDate,
_bvGLTransactionsFull.Description,
_bvGLTransactionsFull.Credit,
_bvGLTransactionsFull.Reference,
_bvGLTransactionsFull.UserName
From
_bvGLAccountsFinancial Inner Join
_bvGLTransactionsFull On _bvGLAccountsFinancial.AccountLink =
_bvGLTransactionsFull.AccountLink
Where
_bvGLTransactionsFull.Credit
IN
(SELECT Credit AS NumOccurrences
FROM _bvGLTransactionsFull
GROUP BY Credit
HAVING (COUNT(Credit) > 1 ) )
Group By
_bvGLTransactionsFull.AccountDesc, _bvGLAccountsFinancial.Description,
_bvGLTransactionsFull.TxDate, _bvGLTransactionsFull.Description,
_bvGLTransactionsFull.Credit, _bvGLTransactionsFull.Reference,
_bvGLTransactionsFull.UserName, _bvGLAccountsFinancial.Master_Sub_Account,
IsNumeric(_bvGLTransactionsFull.Reference), _bvGLTransactionsFull.TrCode
Having
_bvGLTransactionsFull.TxDate > 01 / 11 / 2014 And
_bvGLTransactionsFull.Reference Like '5_____' And
_bvGLTransactionsFull.Credit > 0.01 And
_bvGLAccountsFinancial.Master_Sub_Account = '90210'
That's because you're matching on the credit field back to your table, which contains duplicates. You need to isolate the rows that are duplicated with ROW_NUMBER:
;WITH CTE AS (
SELECT *, ROW_NUMBER() OVER(PARTITION BY CREDIT ORDER BY (SELECT NULL)) AS RN
FROM _bvGLTransactionsFull)
Select
CTE.AccountDesc,
_bvGLAccountsFinancial.Description,
CTE.TxDate,
CTE.Description,
CTE.Credit,
CTE.Reference,
CTE.UserName
From
_bvGLAccountsFinancial Inner Join
CTE On _bvGLAccountsFinancial.AccountLink = CTE.AccountLink
WHERE CTE.RN > 1
Group By
CTE.AccountDesc, _bvGLAccountsFinancial.Description,
CTE.TxDate, CTE.Description,
CTE.Credit, CTE.Reference,
CTE.UserName, _bvGLAccountsFinancial.Master_Sub_Account,
IsNumeric(CTE.Reference), CTE.TrCode
Having
CTE.TxDate > 01 / 11 / 2014 And
CTE.Reference Like '5_____' And
CTE.Credit > 0.01 And
_bvGLAccountsFinancial.Master_Sub_Account = '90210'
Just as a side note, I would consider using aliases to shorten your queries and make them more readable. Prefixing the table name before each column in a join is very difficult to read.
I trust your code in terms of extracting all data per your criteria. With this, let me have a different approach and see your script "as-is". So then, lets keep first all the records in a temp.
Select
_bvGLTransactionsFull.AccountDesc,
_bvGLAccountsFinancial.Description,
_bvGLTransactionsFull.TxDate,
_bvGLTransactionsFull.Description,
_bvGLTransactionsFull.Credit,
_bvGLTransactionsFull.Reference,
_bvGLTransactionsFull.UserName
-- temp table
INTO #tmpTable
From
_bvGLAccountsFinancial Inner Join
_bvGLTransactionsFull On _bvGLAccountsFinancial.AccountLink =
_bvGLTransactionsFull.AccountLink
Where
_bvGLTransactionsFull.Credit
IN
(SELECT Credit AS NumOccurrences
FROM _bvGLTransactionsFull
GROUP BY Credit
HAVING (COUNT(Credit) > 1 ) )
Group By
_bvGLTransactionsFull.AccountDesc, _bvGLAccountsFinancial.Description,
_bvGLTransactionsFull.TxDate, _bvGLTransactionsFull.Description,
_bvGLTransactionsFull.Credit, _bvGLTransactionsFull.Reference,
_bvGLTransactionsFull.UserName, _bvGLAccountsFinancial.Master_Sub_Account,
IsNumeric(_bvGLTransactionsFull.Reference), _bvGLTransactionsFull.TrCode
Having
_bvGLTransactionsFull.TxDate > 01 / 11 / 2014 And
_bvGLTransactionsFull.Reference Like '5_____' And
_bvGLTransactionsFull.Credit > 0.01 And
_bvGLAccountsFinancial.Master_Sub_Account = '90210'
Then remove the "single occurrence" data by creating a row index and remove all those 1 time indexes.
SELECT * FROM (
SELECT
ROW_NUMBER() OVER (PARTITION BY Credit ORDER BY Credit) AS rowIdx
, *
FROM #tmpTable) AS innerTmp
WHERE
rowIdx != 1
You can change your preference through PARTITION BY <column name>.
Should you have any concerns, please raise it first as these are so far how I understood your case.
EDIT : To include those credits that has duplicates.
SELECT
tmp1.*
FROM #tmpTable tmp1
RIGHT JOIN (
SELECT
Credit
FROM (
SELECT
ROW_NUMBER() OVER (PARTITION BY Credit ORDER BY Credit) AS rowIdx
, *
FROM #tmpTable) AS innerTmp
WHERE
rowIdx != 1
) AS tmp2
ON tmp1.Credit = tmp2.Credit

ms access compare value with average of trailing seven days

I have a data set which contains people, dates, food, and quantity of food. I want a query where I specify a person and a date and have two values returned: the quantity of food eaten on the date chosen and the average quantity of food eaten over the previous 7 days.
So if I pick Abe on 1/10/2013 I get "1" and "3.6" because he ate 1 piece of fruit on 1/10 and an average of 3.6 pieces of fruit each day between 1/3 and 1/9.
name,thedate,qty,food
name,thedate,qty,food
abe,1/2/2013,1,orange
abe,1/2/2013,3,pear
abe,1/3/2013,3,orange
abe,1/4/2013,2,orange
abe,1/4/2013,2,plum
abe,1/5/2013,1,orange
abe,1/7/2013,7,onion
abe,1/8/2013,2,orange
abe,1/9/2013,3,orange
abe,1/9/2013,2,pear
abe,1/10/2013,1,orange
jen,1/1/2013,2,orange
jen,1/4/2013,3,orange
jen,1/5/2013,2,orange
You need a correlated subquery to find this
Select
Parent.name
, Parent.thedate
, Parent.qty,
(SELECT avg(qty)
FROM yourTable
where name = parent.name
and thedate < parent.theDate
and theDate>=dateadd("d", datediff("d",0, parent.theDate)-7,0)
group by name) as previousSeven
from yourTable Parent
If this is actually on a per fruit-type basis you can join on that too with and fruit = parent.fruit you need to add fruit to the group by, too
Update
To find, not an average, but the sum of the number of fruit divided by the number of distinct days with data in the last 7 days you will need something more like this (it get's a lot more complicated because access doesn't support the Select count(distinct something) syntax)
Select
Name
, theDate
, qty
, sumOfPreviousSeven/distinctDaysWithDataLastSeven
from (
Select
Parent.name
, Parent.thedate
, Parent.qty
, (SELECT sum(qty)
FROM table4
where name = parent.name
and thedate < parent.theDate
and theDate>=dateadd("d", datediff("d",0, parent.theDate)-7,0)
group by name) as sumOfPreviousSeven
, (select top 1 count(distinctDates) from
(select dateadd("d", datediff("d",0, theDate),0) as distinctDates, name from table4
group by dateadd("d", datediff("d",0, theDate),0), name)
where name = parent.name
and distinctDates < parent.theDate
and distinctDates>=dateadd("d", datediff("d",0, parent.theDate)-7,0)
group by name) as distinctDaysWithDataLastSeven
from table4 Parent) as base

SQL Server: Group similar sales together

I'm trying to do some reporting in SQL Server.
Here's the basic table setup:
Order (ID, DateCreated, Status)
Product(ID, Name, Price)
Order_Product_Mapping(OrderID, ProductID, Quantity, Price, DateOrdered)
Here I want to create a report to group product with similar amount of sales over a time period like this:
Sales over 1 month:
Coca, Pepsi, Tiger: $20000 average(coca:$21000, pepsi: $19000, tiger: $20000)
Bread, Meat: $10000 avg (bread:$11000, meat: $9000)
Note that the text in () is just to clarify, not need in the report).
User define the varying between sales that can consider similar. Example sales with varying lower than 5% are consider similar and should be group together. The time period is also user defined.
I can calculate total sale over a period but has no ideas on how to group them together by sales varying. I'm using SQL Server 2012.
Any help is appreciated.
Sorry, my English is not very good :)
UPDATE: *I figured out about what I atually need ;)*
For an known array of numbers like: 1,2,3,50,52,100,102,105
I need to group them into groups which have at least 3 number and the difference between any two items in group is smaller than 10.
For the above array, output should be:
[1,2,3]
[100,102,105]
=> the algorithm take 3 params: the array, minimum items to form a group and maximum difference between 2 items.
How can I implement this in C#?
By the way, if you just want c#:
var maxDifference = 10;
var minItems = 3;
// I just assume your list is not ordered, so order it first
var array = (new List<int> {3, 2, 50, 1, 51, 100, 105, 102}).OrderBy(a => a);
var result = new List<List<int>>();
var group = new List<int>();
var lastNum = array.First();
var totalDiff = 0;
foreach (var n in array)
{
totalDiff += n - lastNum;
// if distance of current number and first number in current group
// is less than the threshold, add into current group
if (totalDiff <= maxDifference)
{
group.Add(n);
lastNum = n;
continue;
}
// if current group has 3 items or more, add to final result
if (group.Count >= minItems)
result.Add(group);
// start new group
group = new List<int>() { n };
lastNum = n;
totalDiff = 0;
}
// forgot the last group...
if (group.Count >= minItems)
Result.Add(group);
the key here is, the array need to be ordered, so that you do not need to jump around or store values to calculate distances
I can't believe I did it~~~
-- this threshold is the key in this query
-- it means that
-- if the difference between two values are less than the threshold
-- these two values are belong to one group
-- in your case, I think it is 200
DECLARE #th int
SET #th = 200
-- very simple, calculate total price for a time range
;WITH totals AS (
SELECT p.name AS col, sum(o.price * op.quantity) AS val
FROM order_product_mapping op
JOIN [order] o ON o.id = op.orderid
JOIN product p ON p.id = op.productid
WHERE dateordered > '2013-03-01' AND dateordered < '2013-04-01'
GROUP BY p.name
),
-- give a row number for each row
cte_rn AS ( --
SELECT col, val, row_number()over(ORDER BY val DESC) rn
FROM totals
),
-- show starts now,
-- firstly, we make each row knows the row before it
cte_last_rn AS (
SELECT col, val, CASE WHEN rn = 1 THEN 1 ELSE rn - 1 END lrn
FROM cte_rn
),
-- then we join current to the row before it, and calculate
-- the difference between the total price of current row and that of previous row
-- if the the difference is more than the threshold we make it '1', otherwise '0'
cte_range AS (
SELECT
c1.col, c1.val,
CASE
WHEN c2.val - c1.val <= #th THEN 0
ELSE 1
END AS range,
rn
FROM cte_last_rn c1
JOIN cte_rn c2 ON lrn = rn
),
-- even tricker here,
-- now, we join last cte to itself, and for each row
-- sum all the values (0, 1 that calculated previously) of rows before current row
cte_rank AS (
SELECT c1.col, c1.val, sum(c2.range) rank
FROM cte_range c1
JOIN cte_range c2 ON c1.rn >= c2.rn
GROUP BY c1.col, c1.val
)
-- now we have properly grouped theres total prices, and we can group on it's rank
SELECT
avg(c1.val) AVG,
(
SELECT c2.col + ', ' AS 'data()'
FROM cte_rank c2
WHERE c2.rank = c1.rank
ORDER BY c2.val desc
FOR xml path('')
) product,
(
SELECT cast(c2.val AS nvarchar(MAX)) + ', ' AS 'data()'
FROM cte_rank c2
WHERE c2.rank = c1.rank
ORDER BY c2.desc
FOR xml path('')
) price
FROM cte_rank c1
GROUP BY c1.rank
HAVING count(1) > 2
The result will look like:
AVG PRODUCT PRICE
28 A, B, C 30, 29, 27
12 D, E, F 15, 12, 10
3 G, H, I 4, 3, 2
for understanding how I did concatenate, please read this:
Concatenate many rows into a single text string?
This query should produce what you expect, it displays products sales for every months for which you have orders :
SELECT CONVERT(CHAR(4), OP.DateOrdered, 100) + CONVERT(CHAR(4), OP.DateOrdered, 120) As Month ,
Product.Name ,
AVG( OP.Quantity * OP.Price ) As Turnover
FROM Order_Product_Mapping OP
INNER JOIN Product ON Product.ID = OP.ProductID
GROUP BY CONVERT(CHAR(4), OP.DateOrdered, 100) + CONVERT(CHAR(4), OP.DateOrdered, 120) ,
Product.Name
Not tested, but if you provide sample data I could work on it
Look like I made things more complicate than it should be.
Here is what should solve the problem:
-Run a query to get sales for each product.
-Run K-mean or some similar algorithms.

select least row per group in SQL

I am trying to select the min price of each condition category. I did some search and wrote the code below. However, it shows null for the selected fields. Any solution?
SELECT Sales.Sale_ID, Sales.Sale_Price, Sales.Condition
FROM Items
LEFT JOIN Sales ON ( Items.Item_ID = Sales.Item_ID
AND Sales.Expires_DateTime > NOW( )
AND Sales.Sale_Price = (
SELECT MIN( s2.Sale_Price )
FROM Sales s2
WHERE Sales.`Condition` = s2.`Condition` ) )
WHERE Items.ISBN =9780077225957
A little more complicated solution, but one that includes your Sale_ID is below.
SELECT TOP 1 Sale_Price, Sale_ID, Condition
FROM Sales
WHERE Sale_Price IN (SELECT MIN(Sale_Price)
FROM Sales
WHERE
Expires_DateTime > NOW()
AND
Item_ID IN
(SELECT Item_ID FROM Items WHERE ISBN = 9780077225957)
GROUP BY Condition )
The 'TOP 1' is there in case more than 1 sale had the same minimum price and you only wanted one returned.
(internal query taken directly from #Michael Ames answer)
If you don't need Sales.Sale_ID, this solution is simpler:
SELECT MIN(Sale_Price), Condition
FROM Sales
WHERE Expires_DateTime > NOW()
AND Item_ID IN
(SELECT Item_ID FROM Items WHERE ISBN = 9780077225957)
GROUP BY Condition
Good luck!