SELECT DISTINCT sum in google sheet - sql

i've a sheet like this:
Month(Col11)
Team (Col2)
03
luna
03
luna
04
pippo
04
gigi
04
luna
04
gigi
04
pippo
04
luna
04
luna
04
pippo
04
pippo
04
grisbi
04
grisbi
05
luna
05
luna
05
pippo
05
pippo
05
grisbi
05
grisbi
i need the sum of unique of each month, a result like this:
Month(Col11)
Sum of unique (Col2)
03
1
04
4
05
3
i try with:
=QUERY(database_tornei!A:K;"select Col11,count(Col2) group by Col11")
But i've the sum of all Teams in Col2.
Don't know how to use dinstinct in query :(

You can wrap your existing query() in another query() like this:
=query( query(A1:K, "select K, A, count(K) where K is not null group by K, A", 1), "select Col1, count(Col3) group by Col1", 1 )
This will get the count of uniques per month.

I asked some questions in the comments below the original post, but haven't heard back yet. So I'll just provide two versions of a formula, one for each case.
If your data in Col11 is text/strings, use this:
=ArrayFormula(QUERY(UNIQUE({K2:K,B2:B}),"Select Col1, COUNT(Col1) WHERE Col1 Is Not Null GROUP BY Col1 LABEL Col1 'Month', COUNT(Col1) 'Count Unique'"))
If your data in Col11 is real numbers, use this:
=ArrayFormula(QUERY(UNIQUE({TEXT(K2:K,"00"),B2:B}),"Select Col1, COUNT(Col1) WHERE Col1 <> '00' GROUP BY Col1 LABEL Col1 'Month', COUNT(Col1) 'Count Unique'"))
In this second case, you need to convert the numbers to text in order to produce a format with a leading zero as shown in your post example.
This still does not account for distinguishing the months from different years. If your sheet will only ever have months from a single year, you don't need to worry about it. But if you will keep cumulative data that spans more than one calendar year, you will want to change the format of Col11 to something like this:
03 [2021]
04 [2021]
...
12 [2021]
01 [2022]
02 [2022]
03 [2022]
If you decide to do it this way, you can still use the first formula I provided above. It will just assure that months from different years stay separate.

Related

How to join two tables while only selecting the highest day of each month from one table

I have two tables. One with metadata, one with billing data. I need to join those effiently in order to assign metadata to costs.
Table 1 (metadata) looks like this:
year month day id label1 label2
2021 06 04 892221805 foo aaa
2021 06 30 892221805 bar aaa
2021 06 04 594083437 baz aaa
2021 06 04 552604244 baz bbb
Table 2 (billing data) looks like this:
year month id cost
2021 06 892221805 1.00 $
2021 06 892221805 1.00 $
2021 06 594083437 1.00 $
2021 06 552604244 1.00 $
For each combination of year, month, id in Table 2, there is an corresponding ID in Table 1.
For each year, month, id in T2, i need label1, label2 from the row in T1 which matches year, month, id, and has the highest date (in that month), so that the result may look like this:
year month id cost label1 label2
2021 06 892221805 1.00 $ bar aaa
2021 06 892221805 1.00 $ bar aaa
2021 06 594083437 1.00 $ baz aaa
2021 06 552604244 1.00 $ baz bbb
I.e. the first row of T1 is not used, as the second row has labels with a newer date in that month.
I am using Atheana on Amazon Webservices, which should be Presto compatible, I think.
How do I select this correctly? Preferably, in a way that can be used as a view.
You can use row_number() to get to the last row in a month:
select t2.*, t1.label1, t1.label2
from table2 t2 left join
(select t1.*
row_number() over (partition by year, month, id order by day desc) as seqnum
from table1 t1
) t1
on t1.id = t2.id and t1.year = t2.year and
t1.month = t2.month and seqnum = 1;

Create number sequence based on two columns

I'm trying to generate number sequence based on 2 columns, Sno and UnitCost. The numbers should run down sequentially but they shouldn't change when both the columns are same. But if any one column is different it should increment.
I tried something with row_number(), rank(), dense_rank() but have been unable to hit the right logic.
Here's the required column and existing columns:
Sno UnitCost RequiredColumn
ch01 10 01
ch01 10 01
ch02 20 02
ch02 20 02
ch02 30 03
ch02 30 03
ch03 10 04
Any tips? Thanks.
Using DENSE_RANK:
SELECT Sno, UnitCost, DENSE_RANK() OVER (ORDER BY Sno, UnitCost) RequiredColumn
FROM yourTable;

Get the chains and highest parent, in a hierarchy of organisations

I have the following data in TABLE_A, where ORG_1 is the parent of ORG_2:
ORG_1 ORG_2
01 02
02 03
02 04
05 06
So, org 01 is the parent of org 02, and org 02 is the parent of 03 and 04. Org 5 is the parent of only org 06.
I need to have unique names/numbers for the chains, and get reported the highest parent in the chain. Chain I define as 'all organisations that are related to each other'.
This is the desired result:
Chain ORG_1 ORG_2 Highest_Parent_In_Chain
1 01 02 01
1 02 03 01
1 02 04 01
2 05 06 05
Chain=1 has a tree structure starting from ORG_1=01. Chain=2 has it's own chain.
I found some info about CONNECT BY, CONNECT BY PRIOR and CONNECT_BY_ROOT, but I don't get it working. Does anyone has an idea how to achieve this with a query in Oracle?
The chain number can be created with the analytic DENSE_RANK() function.
The highest parent in chain is a feature of hierarchical queries: the function CONNECT_BY_ROOT().
Your hierarchical table is non-standard - in a standard arrangement, the top levels (organizations 01 and 05) would also have a row where they appear as ORG_2, with NULL as ORG_1. That way the highest levels in the hierarchy are very easy to find: just look for ORG_1 IS NULL. As it is, the START WITH clause is more complicated, because we must find the tops first. For that we look for values of ORG_1 that do not also appear in ORG_2. That is the work done in the subquery in the START WITH clause.
with
table_a ( org_1, org_2 ) as (
select '01', '02' from dual union all
select '02', '03' from dual union all
select '02', '04' from dual union all
select '05', '06' from dual
)
-- End of simulated input data (for testing purposes only).
-- Solution (SQL query) begins BELOW THIS LINE.
select dense_rank() over (order by connect_by_root(org_1)) as chain,
org_1, org_2,
connect_by_root(org_1) as highest_parent_in_chain
from table_a
connect by org_1 = prior org_2
start with org_1 in
( select org_1 from table_a a
where not exists (select * from table_a where org_2 = a.org_1)
)
;
CHAIN ORG_1 ORG_2 HIGHEST_PARENT_IN_CHAIN
----- ----- ----- -----------------------
1 01 02 01
1 02 03 01
1 02 04 01
2 05 06 05

Oracle sql split amounts by weeks

So I have a table like:
UNIQUE_ID MONTH
abc 01
93j 01
acc 01
7as 01
oks 02
ais 02
asi 03
asd 04
etc
I query:
select count(unique_id) as amount, month
from table
group by month
now everything looks great:
AMOUNT MONTH
4 01
2 02
1 03
etc
is there a way to get oracle to split the amounts by weeks?
the way that the result look something like:
AMOUNT WEEK
1 01
1 02
1 03
1 04
etc
Assuming you know the year - lets say we go with 2014 then you need to generate all the weeks a year
select rownum as week_no
from all_objects
where rownum<53) weeks
then state which months contain the weeks (for 2014)
select week_no, to_char(to_date('01-JAN-2014','DD-MON-YYYY')+7*(week_no-1),'MM') month_no
from
(select rownum as week_no
from all_objects
where rownum<53) weeks
Then join in your data
select week_no,month_no, test.unique_id from (
select week_no, to_char(to_date('01-JAN-2014','DD-MON-YYYY')+7*(week_no-1),'MM') month_no
from
(select rownum as week_no
from all_objects
where rownum<53) weeks) wm
join test on wm.month_no = test.tmonth
This gives your data for the each week as you described above. You can redo your query and count by week instead of month.

SQL: without a cursor, how to select records making a unique integer id (identity like) for dups?

If you have the select statement below where the PK is the primary key:
select distinct dbo.DateAsInt( dateEntered) * 100 as PK,
recordDescription as Data
from MyTable
and the output is something like this (the first numbers are spaced for clarity):
PK Data
2010 01 01 00 New Years Day
2010 01 01 00 Make Resolutions
2010 01 01 00 Return Gifts
2010 02 14 00 Valentines day
2010 02 14 00 Buy flowers
and you want to output something like this:
PK Data
2010 01 01 01 New Years Day
2010 01 01 02 Make Resolutions
2010 01 01 03 Return Gifts
2010 02 14 01 Valentines day
2010 02 14 02 Buy flowers
Is it possible to make the "00" in the PK have an "identity" number effect within a single select? Otherwise, how could you increment the number by 1 for each found activity for that date?
I am already thinking as I type to try something like Sum(case when ?? then 1 end) with a group by.
Edit: (Answer provided by JohnFX below)
This was the final answer:
select PK + row_number() over
(PARTITION BY eventid order by eventname) as PK,
recordDescription
from (select distinct -- Removes row counts of excluded rows)
dbo.DateAsInt( dateEntered) as PK,
recordDescription as Data
from MyTable) A
order by eventid, recordDescription
I think you are looking for ROW_NUMBER and the associated PARTITION clause.
SELECT DISTINCT dbo.DateAsInt(DateEntered) * 100 as PK,
ROW_NUMBER() OVER (PARTITION BY DateEntered ORDER BY recordDescription) as rowID,
recordDescription as Data
FROM MyTable
If DateEntered has duplicate values you probably also want to check out DENSE_RANK() and RANK() depending on your specific needs for what to do with the incrementor in those cases.
Datetime columns should never ever be used as primary key. Plus, if you had an indetity column, you couldn't have:
PK Data
2010 01 01 01 New Years Day
2010 01 01 02 Make Resolutions
2010 01 01 03 Return Gifts
2010 02 14 01 Valentines day
2010 02 14 02 Buy flowers
As 2010 01 01 01 would have the same identity as 2010 02 14 01.
The best approach would be to have an identity column apart, and have your holiday date in another column, and concatenate the converted to nvarchar value of each of these fields.