SQL grouping - throwing out old results

SQL grouping - throwing out old results - sql

Some data would be organized thusly:
ID DATE COUNT1 COUNT2
A 20120101 1 2
A 20120201 2 2
B 20120101 3 0
C 20111201 1 0
C 20120301 2 2
Another table has ID NAME
A MYNAME
.... etc
i want to return a table of
ID NAME COUNT COUNT2
for the most recent available piece of data, i.e. the january count for A is not included
i know I need to use HAVING, INNER JOIN, and GROUP BY but every iteration I can come up with has an error.

If you only want rows with the date equal to the global maximum date, just use a subquery:
select ID,DATE,COUNT1,COUNT2
from table
where DATE=(select max(DATE) from table);
If you want the maximum date per ID, then you can use a self join:
select ID,MAX_DATE,COUNT1,COUNT2
from(
select ID,max(DATE) as MAX_DATE
from table
group by ID
)a
join(
select ID,DATE,COUNT1,COUNT2
from table
)b
on (a.ID=b.ID and a.MAX_DATE=b.DATE);

Not necessarily. This should also work:
select t1.id, t2.name, t1,count1, t1.count2
from table_1 t1 join table_2 t2 on (t1.id = t2.id)
where not exists (
select 1
from table_1 t3
where t1.id = t3.id
and t1.date < t3.date)
order by 1;

You'll need a correlated subquery:
SELECT Id, Name, Count1, Count2
FROM CountsTable AS T1 INNER JOIN NamesTable ON T1.Id=NamesTable.Id
WHERE CountsTable.Date = (
SELECT Max(Date) From CountsTable AS T2 WHERE T1.Id=T2.Id
)

Related

joining two tables on id for the first row available SQL BigQuery

I have two tables that I need to join on date and id. The first table has date, id, name columns. Each name is associated with a couple of ids. The data looks like this:
date id name
7/11 1 A
7/11 1 A
7/11 1 A
7/11 1 A
7/11 1 A
7/11 2 A
7/11 2 A
7/11 2 A
7/11 2 A
7/11 2 A
The other table has Date, id, shares. It does not have the name associated with the id. The table looks like this:
date id shares
7/11 1 5
7/11 2 4
The end goal is to get the sum of the shares per name or rather per the list of the ids associated with the name. Here is the code:
SELECT t1.date, t1.name,
COALESCE(SUM(t2.shares), 0) shares
FROM table1 t1 LEFT JOIN table2 t2
ON t2.date = t1.date AND t2.id = t1.id
GROUP BY t1.date, t1.name
This works perfectly well, but because table_1 list the same id 5 times, the sum is 5 times bigger than it supposed to be. So I only need to grab the first row from table 1 in the JOIN.
The desired output is this:
date name shares
7/11 A 9

I think you should fix your data model so there are no duplicates. One option is to remove the duplicates before joining:
SELECT t1.date, t1.name,
COALESCE(SUM(t2.shares), 0) as shares
FROM (SELECT DISTINCT t1.date, t1.id, t1.name
FROM table1 t1
) t1 LEFT JOIN
table2 t2
ON t2.date = t1.date AND t2.id = t1.id
GROUP BY t1.date, t1.name

If you can't fix your underlying data to remove duplicates, it might be a good idea to use CTEs (or a subquery).
with
table_a as (select * from `project.dataset.table_a`),
table_b as (select * from `project.dataset.table_b`),
deduped_a as (select distinct date, id, name from table_a)
select
date,
name,
sum(coalesce(shares,0)) as shares
from deduped_a
left join table_b using(id, date)
group by 1,2

Union All but keep only duplicates from one table in T-SQL

I have two table which I would like to union. I need to keep only the duplicates from one of the two tables. I tried to find it, but could not find it anywhere. Hope somebody can help.
For example:
Table_1:
ID
Product
Amount
1
A
10
2
B
10
3
C
10
Table_2:
ID
Product
Amount
3
C
9
4
A
100
5
B
100
Desired result:
ID
Product
Amount
1
A
10
2
B
10
3
C
9
4
A
100
5
B
100
So always use the duplicates from table_2. In this example ID 3 is duplicate, so use the duplicate of table_2 with amount 9.
How to realize this with T-SQL? I used the code below:
Select * from Table_1 where Table_1.id != Table_2.id
Union All
Select * from Table_2
But then I receive the error:
'The multi-part identifier "Table_2.ID" could not be bound.'

Use not exists:
Select t1.*
from Table_1 t1
where not exists (select 1 from table_2 t2 where t2.id = t1.id)
Union All
Select t2.*
from Table_2 t2;

Try this:
SELECT T1.*
FROM #Table1 T1
WHERE T1.ID NOT IN (SELECT ID FROM #Table2)
UNION
SELECT T2.*
FROM #Table2 T2

I assume what you want is an EXISTS:
SELECT T1.ID,
T1.Product,
T1.Amount
FROM dbo.Table1 T1
WHERE NOT EXISTS (SELECT 1
FROM dbo.Table2 T2
WHERE T1.ID = T2.ID)
UNION ALL
SELECT T2.ID,
T2.Product,
T2.Amount
FROM dbo.Table2 T2;
A FULL OUTER JOIN, however, might also work if ID is unique in both tables:
SELECT ISNULL(T2.ID,T1.ID) AS ID,
ISNULL(T2.Product,T1.Product) AS Product,
ISNULL(T2.Amount,T1.Amount) AS Amount
FROM dbo.Table1 T1
FULL OUTER JOIN dbo.Table2 T2 ON T1.ID = T2.ID;

Union will give you the result. Union will always return unique values always. If you use union all you will get all with duplicates. Your answer would be to use union all.
SELECT
B.ID
,B.Product
,B.Amount
FROM
(
SELECT
A.ID
,A.Product
,A.Amount
,ROW_NUMBER() over (Partition BY ID, Product order by Amount ASC) AS [row_num]
FROM
(
SELECT
tb_1.*
FROM tb_1
UNION ALL
SELECT
tb_2.*
FROM tb_2
) AS A
) AS B
WHERE B.[row_num] = 1

SQL Get Count Across 3 Tables

I have a 3 tables, one with these two columns
table1:
id, name
0 foo
1 etc
2 example
table2:
id table1_id
0 1
1 0
2 2
table3:
id table2_id
0 1
1 0
2 0
Which query can I find all 'name's from table1 where ALL ids in table2 have a count of atleast n in table3? i.e if n was 1 it should return foo and etc
EDIT:
Explained poorly, I'm trying to get the name of every record in table1 where ALL corresponding records in table2 (i.e records where the column table1_ID is equal to each id within table1. In my example tables, each ID has one) have a count in table3 of atleast n.
If n was 1, as the table2_id 0 appears twice in records 1 and 2, its 'parent' would be returned. It corresponds to the table 1 record 1, so the name of the record with table1 id: 1 should be returned, which is etc. Example also as it has a count of 1 in the bottom column, however foo does not appear so it shouldnt.
Expected result:
name
foo
etc

You can do this using a subquery in the where clause:
select t1.*
from table1 t1
where (select count(t3.id)
from table2 t2 left join
table3 t3
on t3.table2_id = t2.id
where t2.table1_id = t1.id
group by t2.id
order by count(*) asc -- to get the minimum
limit 1
) >= ? -- value you care about
I suspect that this might have the best performance with appropriate indexes: table2(table1_id, id) and table3(table2_id).

If I have understood the question - if a check on table3.table2_id is greater than 0, the answer would be 'etc' ?
Code below
select t1.name
from
(
select 0 as id, 1 as table2_id
union select 1, 0
union select 2 , 0
) t3
inner join
(
select 0 as id , 1 as table_id
union select 1, 0
union select 2, 2
) t2 on t2.table_id = t3.table2_id
inner join
(
select 0 as id, 'foo' as name
union select 1 , 'etc'
union select 2 , 'example'
) t1 on t1.id = t2.table_id
where t3.table2_id > 0

select table1.name
FROM table1
INNER JOIN table2 ON table1.id=table2.table1_id
INNER JOIN table3 ON table2.id=table3.table2_id
GROUP BY table1.name
HAVING count(*) >= 1
replace the last 1 with whatever n you desire
Here's the sql fiddle if you want to play around with it: http://sqlfiddle.com/#!7/14217/4

Use an INNER join of table1 to table2 and then a LEFT join to table3 and count the corresponding ids of table3.
Then by a 2nd level of aggregation return only the rows of table1 where all the counters are at least 1:
SELECT id, name
FROM (
SELECT t1.id, t1.name, COUNT(t3.id) counter
FROM table1 t1
INNER JOIN table2 t2 ON t2.table1_id = t1.id
LEFT JOIN table3 t3 ON t3.table2_id = t2.id
GROUP BY t1.id, t1.name, t2.id
)
GROUP BY id, name
HAVING MIN(counter) >= 1 -- change to the number that you want
See the demo.
Results:
id
name
0
foo
1
etc

table of the number of occurrences of numbers in the column postgresql

I have a table:
My select:
select regexp_split_to_table(t3."Id"::character varying,'') as s
from (select t1."Id" from table1 t1
union all
select t2."Id"from table2 t2) t3
order by s
Or also I can get a string '22173345566179111134546175622323811' with this:
select string_agg(t3."Id"::character varying,'') as s
from (select t1."Id" from table1 t1
union all
select t2."Id"from table2 t2) t3
I need to get a table with number|count data, I mean for any number to get a count of repetitions in the select, for example:
1 | 9
2 | 5
3 | 5
and so on..
PostgreSQL DBMS

Does this do what you want?
select id, count(*)
from (select t1."Id" from table1 t1
union all
select t2."Id" from table2 t2
) t3
group by id
order by id;

If I understand you right, you want a list of all digits, that exist in a set of IDs from two tables and the count of each digit, how often it appears in all these IDs. If so, you just need to GROUP BY a digit and use count().
SELECT s.d,
count(*) count
FROM (SELECT t1."Id"
FROM table1 t1
UNION ALL
SELECT t2."Id"
FROM table2 t2) t3
CROSS JOIN LATERAL regexp_split_to_table(t3."Id"::character varying, '') s(d)
GROUP BY s.d
ORDER BY s.d;

easiest way
select regexp_split_to_table(t3."Id"::character varying,'') s, count(*) count
from (select t1."Id" from table1 t1 union all select t2."Id"from table2 t2) t3
group by s

SQL Select row from table with some minimum value from another table

I suppose this is not so hard but I can not get it.
For example I have table T1:
ID
-----
1000
1001
And I have table T2:
ID GROUP DATE
--------------------------
1000 ADSL 2.2.2012
1000 null 3.2.2012
1000 NOC 4.2.2012
1001 NOC 5.2.2012
1001 null 6.2.2012
1001 TV 7.2.2012
I want to select from T1 only the row that has as GROUP value NOC from T2 but only if NOC group is for the minimum DATE value in T2.
So my result in this case would be only 1001 because for its minimum DATE 5.2.2012 Group is NOC!
I do not want any joins and I can not use default values for IDs (where id=1000 or id=1001) because this is just example of some big table.
Important also is that I can not use t1.id = t2.id because in some application where I am using this I can not write the whole SQL expression but only partial. I can only use id.
I tried something like:
select id
from t1
where
id in (select id from t2
where group = 'NOC'
and date in (select min(date) from t2
where id in (select id from t1)
)
)
But this does not work.
I know it seems little confusing but I really can't use where t1.id = t2.id
Thanks

If T2.ID is a foreign key referencing T1.ID, you don't really need the T1 table, because all the IDs could be obtained from T2 only:
SELECT o.ID
FROM T2 AS o
WHERE EXISTS (
SELECT MIN(i.DATE)
FROM T2 AS i
WHERE i.ID = o.ID
HAVING MIN(i.DATE) = o.DATE
)
WHERE o."GROUP" = 'NOC'
But if you insist on involving T1, you just need to modify the above like this:
SELECT *
FROM T1
WHERE ID IN (
SELECT o.ID
FROM T2 AS o
WHERE o."GROUP" = 'NOC'
AND EXISTS (
SELECT MIN(i.DATE)
FROM T2 AS i
WHERE i.ID = o.ID
HAVING MIN(i.DATE) = o.DATE
)
)

Can you do this in multiple steps?
First of all, to get the minimum date per id, you would need:
select id, peoplegroup, min(date)
from t2
group by id
That will give you
1000 ADSL 2.2.2012
1001 NOC 5.2.2012
Call this table t3.
Then do
select id
from t3
where id in (
select id from t1
)

Try this:
select id from t1 where id in
(select id from t2 where group = 'NOC' and date =
(select min(date) from t2 where id = t1.id))

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL grouping - throwing out old results - sql

Not necessarily. This should also work: select t1.id, t2.name, t1,count1, t1.count2 from table_1 t1 join table_2 t2 on (t1.id = t2.id) where not exists ( select 1 from table_1 t3 where t1.id = t3.id and t1.date < t3.date) order by 1;

You'll need a correlated subquery: SELECT Id, Name, Count1, Count2 FROM CountsTable AS T1 INNER JOIN NamesTable ON T1.Id=NamesTable.Id WHERE CountsTable.Date = ( SELECT Max(Date) From CountsTable AS T2 WHERE T1.Id=T2.Id )

Related

joining two tables on id for the first row available SQL BigQuery

Union All but keep only duplicates from one table in T-SQL

SQL Get Count Across 3 Tables

table of the number of occurrences of numbers in the column postgresql

SQL Select row from table with some minimum value from another table

Categories

Resources