Avoid double counting - only count first occurrence in table - sql

I am trying to do a count by month of the total number of items (serialnumber) that appears in inventory.
This usually can be easily solved with distinct, however, I only want to count if it is the first occurrence that it appears (first insert).
This query gets me most of the way there.
select date_trunc (‘month’,date) as Date,productid, count(distinct serialnumber) from inventory
where date_trunc(‘month’,date)>= ‘2016-01-01’ and productID in ('1','2') and status = ‘INSERT’
group by date_trunc(‘month’,date), productid
order by date_trunc(‘month’,date) desc
But I realize I am double/triple/quadruple counting some serial numbers because an item can reappear in our inventory multiple times over the course of its lifecycle.
The query above covers these scenarios since the serial numbers appear once:
Shows up as new
Shows up as used
Below are the use cases where I realize I may be double/triple/quadruple counting:
Shows up as new then comes back around as used (no limit to how many times it can appear used)
Shows up used then comes back again as used (no limit to how many times it can appear used)
Here's an example I ran into.
(Note: I have added the condition column to better illustrate this). But the particular serial number has been in inventory three times (first as new, then as used twice)
Date
ProductID
Count
Condition
7-1-21
1
1
u
11-1-18
1
1
u
2-1-17
1
1
n
In my current query results, each insert gets counted (once in Feb 2017, once in Nov 2018 and once in July 2021).
How can I amend my query to make sure I'm only counting the very first instance (insert) a particular serial number appears in the inventory table?

In the subquery calculate first insert date only of each product/item using min aggregate function. Then count the items on that result:
select Date, productid, count(serialnumber)
from (
select min(date_trunc(‘month’,date)) as Date, productid, serialnumber
from inventory
where date_trunc(‘month’,date) >= ‘2016-01-01’
and productID in ('1','2')
and status = ‘INSERT’
group by productid, serialnumber
) x
group by Date, productid
order by Date desc;

Related

How to write an SQL query to get max number of counts for the most number of travelling of a user within a month

I have been given a task by my manager to write a SQL query to select the max number of counts (no of records) for a user who has travelled the most within a month provided that if the user travels multiple places on the same date, then it should be counted as one. For instance, if you look at the following table design; according to this scenario, my query must return me a count of 2. Although traveller_id "1" has traveled three times within a month, but he traveled to Thailand and USA on the same date, that is why its count is reduced to 2.
I have also developed my logic for this query but I am unable to write it due to lack of syntax knowledge. I split up this query into 3 parts:
Select All records from the table within a month using the MONTH function of SQL
Select All distinct DateTime records from the above result so that the same DateTime gets eliminated.
Select max number of counts for the traveller who visited most places.
Please help me in completing my query. You can also use a different approach from mine.
You can use the count aggregation in a cte then select top(1):
with u as
(select traveller_id,
count(distinct visit_date) as n
from travellers_log
where visit_date between '2022-03-01' and '2022-03-31'
group by traveller_id)
select top(1) traveller_id, name, n from u inner join table_travellers
on u.traveller_id = table_travellers.id
order by n desc;

Google Sheets Query Function. How can I get only Unique or Distinct Rows?

I am trying to answer a question on a case using the Query function on Google Sheets and am stuck on a particular problem.
I need to get the total number of unique orders per year. I used the formula below and managed to get the total orders per year.
=QUERY(raw_data!$A$1:$U$9995, "select YEAR(C), COUNT(B) group by YEAR(C)", 1)
Where column C is the date and B is the order_id.
The problem is that this returns a total of 9994 orders and includes duplicates of the same order. For example, if a customer purchased 3 different products, they would each be given a line in the database and would count as 3 of the 9994 orders. However, they all have the same order_id.
I need to get the number of unique orders per year. I know this number is 5009 since I did some manual research through Excel, but wanted to find that same total, separated by year, using the Query Function since this is a case to test my SQL Knowledge.
Is this possible? Does the Query Function have a way to get the count for unique order_ids? Thank you very much for your help!
See if this helps
=QUERY(UNIQUE(raw_data!$B$1:$C$9995), "select YEAR(Col2), COUNT(Col1) where Col2 is not null group by YEAR(Col2)", 1)

Count records in each month using Month(dateField) in SQL

I have a large table containing c650,000 records. They are individuals with email addresses and one of the fields is 'dateOfApplication'. I have been asked for a breakdown of how many people signed up in each month.
I'd like the results to look something like
Month Year Total
1 2017 50763
2 2017 34725
And have made a target table in this format to put the results in. I've been able to use Month(dateOfApplication) to get the month component of the date using
SELECT DISTINCT
(SELECT COUNT(1) FROM [UG_Master]
WHERE MONTH([UG_Master].dateOfApplication) = '6') as Total
To return particular months, but don't really know how to get one row for each month it finds.
but don't really know how to get one row for each month it finds.
You can use GROUP BY :
SELECT MONTH([UG_Master].dateOfApplication), COUNT(1)
FROM [UG_Master]
GROUP BY MONTH([UG_Master].dateOfApplication);
If you want year wise months then include year also :
SELECT YEAR([UG_Master].dateOfApplication), MONTH([UG_Master].dateOfApplication), COUNT(1)
FROM [UG_Master]
GROUP BY YEAR([UG_Master].dateOfApplication), MONTH([UG_Master].dateOfApplication);

SQL query that calculates historical average and checks if current value is greater multiple than 3

I am try to calculate the average since the last time stamp and pull all records where the average is greater than 3. My current query is:
SELECT AVG(BID)/BID AS Multiple
FROM cdsData
where Multiple > 3
and SqlUnixTime > 1492225582
group by ID_BB_RT;
I have a table cdsData and the unix time is april 15th converted. Finally I want the group by calculated within the ID as I show. I'm not sure why it's failing but it says that the field Multiple is unknown in the where clause.
I am try to calculate the average since the last time stamp and pull all records where the average is greater than 3.
I think your intention is correctly stated as follows, "I am trying to calculate the average since the last time stamp and select all rows where the average is greater than 3 times the individual bid".
In fact, a still better restatement of your objective would be, "I want to select all rows since the last time stamp, where the bid is less than 1/3rd the average bid".
For this, the steps are as follows:
1) A sub-query finds the average bid divided by 3, of rows since the last time stamp.
2) The outer query selects rows since the last time stamp, where the individual bid is < the value returned by the sub-query.
The following SQL statement does that:
SELECT BID
FROM cdsData
WHERE SqlUnixTime > 1492225582
AND BID <
(
SELECT AVG(BID) / 3
FROM cdsData
WHERE SqlUnixTime > 1492225582
)
ORDER BY BID;
1)
SQL is evaluated backwards, from right to left. So the where clause is parsed and evaluate prior to the select clause. Because of this the aliasing of AVG(BID)/BID to Multiple has not yet occurred.
You can try this.
SELECT AVG(BID)/BID AS Multiple
FROM cdsData
WHERE SqlUnixTime > 1492225582
GROUP BY ID_BB_RT Having (AVG(BID)/BID)>3 ;
Or
Select Multiple
From (SELECT AVG(BID)/BID AS Multiple
FROM cdsData
Where SqlUnixTime > 1492225582 group by ID_BB_R)X
Where multiple >3
2)
Once you corrected the above error, you will be having one more error:
Column 'BID' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
To correct this you have to insert BID column in group by clause.

Joining a second instance of Sales table to get last weeks Sales

I have a Sales table showing product number, sales value, and sales volume per week. I need to build a report to display these values and volumes along with the equivalent values from the previous week. I also have a Weeks table which gives me the previous week number for the current week (for instance if current week is 2013-01, then the previous week value is 2012-52).
I therefore assumed it would be simple enough to join to another instance of Sales on product number and previous week number from the Weeks table. However Teradata is not letting me do this, initially it threw an error of Improper column reference in the search condition of a joined table and when I re-ordered the query to reference Weeks before the second instance of Sales it now tries to run but gives me a No more spool space error, so I assume my approach is incorrect. My SQL is as follows:
select s.Week_Number,
s.Product_Number,
s.Sales_Value,
s.Sales_Volume,
s_lw.Sales_Value,
s_lw.Sales_Volume
from SALES s
inner join WEEKS w
on s.Week_Number = w.Week_Number
left join SALES s_lw
on s.Product_Number = s_lw.Product_Number
and s_lw.Week_Number = w.Last_week_Number
Could anyone please suggest what I'm doing wrong here? It seems like this should be achievable.
I would suggest using a Window Aggregate Function to accomplish this with a single pass of the SALES table:
SELECT DISTINCT
s.Week_Number,
s.Product_Number,
s.Sales_Value,
s.Sales_Volume,
MAX(s.Sales_Value) OVER (PARTITION BY s.Product_Number
ORDER BY s.Week_Number DESC
ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) AS LW_Sales_Value,
MAX(s.Sales_Volume) OVER (PARTITION BY s.Product_Number
ORDER BY s.Week_Number DESC
ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) AS LW_Sales_Volume
FROM SALES s;