Select count(*) from table where (multiple id) in (table) - sql

Is there a way to write
SELECT count(*) from tablename where (multiple_ids_here) in (SELECT id from tablename)
Normally, I would write:
select count(*) from tablename
where id_1 in (SELECT id from tablename)
OR id_2 in (SELECT id from tablename)
id_3 in (SELECT id from tablename)
which very inefficient if we have multiple values.
Anyone?
EDIT: Question updated. What if I want to select count?

Your version with three ins is probably the most efficient way of doing this. If you want a comparison to try, you can use exists:
select . . .
from t t1
where exists (select 1
from tablename t2
where t2.id in (t1.id_1, t1.id_2, t1.id_3)
);
I should also note that storing ids in multiple columns like this is usually a sign of a problem with the data model. You probably want a table with one row per id, rather than one column per id. Such a format would also simplify this type of query.

For the updated question regarding getting a count(*)... using cross apply() with values() to unpivot your data in a common table expression:
;with cte as (
select t.Id, v.RelatedId
from t
cross apply (values (id_1),(id_2),(id_3)) v(RelatedId)
)
select
cte.Id
, RelationCount = count(*)
from cte
inner join RelatedTable r
on cte.RelatedId = r.Id
group by cte.Id

I am not sure i understand your question could you give an example of the data you are using and the out come.
From what i understand you could use a cte like this .
;WITH Sales_CTE ([counts],CustomerID, SalespersonPersonID,PickedByPersonID)
AS
(
select count(*),CustomerID,SalespersonPersonID ,PickedByPersonID
from [WideWorldImporters].[Sales].[Orders]
group by CustomerID,SalespersonPersonID,PickedByPersonID
)
SELECT sum([counts])
FROM Sales_CTE
GO
It would give you a result like this . You would jsut have to change the columns around .

Related

SQL - Removing Row Groups

I have a table with the following information:
Is there a way to remove all groups which have multiple IDs? For example group 3 would be removed because it consists of ID 1 and 2.
Thank you!
A simple, portable and efficient approach is not exists:
select t.*
from mytable t
where not exists (
select 1
from mytable t1
where t1.group = t.group and t1.id <> t.id
)
For performance, consider an index on (group, id).
Side note: group is a SQL keyword (as in group by), hence not a good choice for a column name.
You can use below query to remove all groups having multiple IDs
Delete from <your_table_name> where Group in (select Group from <your_table_name> group by Group,ID having count(*) > 1)
inner query will return Group having multiple IDs.
select * from temp where group in (
select groups from temp group by id,group having count(1)<3)
delete from temp where group in (
select groups from temp group by id,group having count(1)<3)
Try to execute below query:
select id,group from table where group in
(
select group from(
select group,count(distinct id) as cn from table group by 1 having cn=1) a
)

Multiple statements with one cte?

What is the correct syntax for the following? (I need these in one query)
--- 1. task
update A set .... where ....
insert into A (...) values (...);
--- 2 .task
With cte as (select A.column...)
update A set ... if condition1(includes cte table)
update A set ... if condition2(includes cte table)
update A set ... if condition3(includes cte table)
In words:
I update table A or insert into it
After that I refer to this updated TableA in a cte table, which contains a ROW_NUMBER function,
And then I want to update TableA again depended on that rownumber from CTE in a specific row, for example: if rownumber value in the CTE is 1, do this, if it is max(rownumber) for that specific row then do that....
I read that cte-s only persist for a single statement. I tried to copy the cte for every update statement, separated with semicolons, but that didn't work. I read about MERGE but I'm not sure if this is the right way for that. Is it the OUTPUT clause, if yes, how to use it? Or something else? Can you help me please?
You don't need multiple update statement. You can do with one.
; WITH CTE as ( select . . . )
UPDATE A
SET col1 = case when .... then new_1 else col1 end,
col2 = case when .... then new_2 else col2 end
FROM CTE as A
For anyone who tries to avoid copy-paste and wants to reuse some complex query from CTE to perform multiple statements, probably temporary table would work for you:
select A.column as AnyColumn, ... into #NameYouWant from A, ...
update A set column1 = (select AnyColumn from #NameYouWant where ...)
update B set column2 = (select AnyColumn from #NameYouWant where ...)
update C set column3 = (select AnyColumn from #NameYouWant where ...)
Please note that number sign (#) in table name is important, it will indicate that table is temporary, so it would persist only within your current session.
Correct, a CTE only persists in the statement it exists in. Therefore a statement like the following will fail:
WITH CTE AS(
SELECT *,ROW_NUMBER() OVER (PARTITION BY Date ORDER BY ID) AS RN
FROM YourTable)
SELECT *
FROM CTE
WHERE RN = 1;
SELECT *
FROM CTE
WHERE RN = 2;
That's because the CTE no longer exists during the second statement.
For an UPDATE (or any statement ) you'll therefore need to redeclare your CTE each time. Thus, using the example above, you would have to do:
WITH CTE AS(
SELECT *,ROW_NUMBER() OVER (PARTITION BY Date ORDER BY ID) AS RN
FROM YourTable)
SELECT *
FROM CTE
WHERE RN = 1;
WITH CTE AS(
SELECT *,ROW_NUMBER() OVER (PARTITION BY Date ORDER BY ID) AS RN
FROM YourTable)
SELECT *
FROM CTE
WHERE RN = 2;
If the expressions within the CTE are quite complex, and you use them often, you might instead consider using a VIEW.

Scalable Solution to get latest row for each ID in BigQuery

I have a quite large table with a field ID and another field as collection_time. I want to select latest record for each ID. Unfortunately combination of (ID, collection_time) time is not unique together in my data. I want just one of records with the maximum collection time. I have tried two solutions but none of them has worked for me:
First: using query
SELECT * FROM
(SELECT *, ROW_NUMBER() OVER (PARTITION BY ID ORDER BY collection_time) as rn
FROM mytable) where rn=1
This results in Resources exceeded error that I guess is because of ORDER BY in the query.
Second
Using join between table and latest time:
(SELECT tab1.*
FROM mytable AS tab1
INNER JOIN EACH
(SELECT ID, MAX(collection_time) AS second_time
FROM mytable GROUP EACH BY ID) AS tab2
ON tab1.ID=tab2.ID AND tab1.collection_time=tab2.second_time)
this solution does not work for me because (ID, collection_time) are not unique together so in JOIN result there would be multiple rows for each ID.
I am wondering if there is a workaround for the resourcesExceeded error, or a different query that would work in my case?
SELECT
agg.table.*
FROM (
SELECT
id,
ARRAY_AGG(STRUCT(table)
ORDER BY
collection_time DESC)[SAFE_OFFSET(0)] agg
FROM
`dataset.table` table
GROUP BY
id)
This will do the job for you and is scalable considering the fact that the schema keeps changing, you won't have to change this
Short and scalable version:
select array_agg(t order by collection_time desc limit 1)[offset(0)].*
from mytable t
group by t.id;
Quick and dirty option - combine your both queries into one - first get all records with latest collection_time (using your second query) and then dedup them using your first query:
SELECT * FROM (
SELECT *, ROW_NUMBER() OVER (PARTITION BY tab1.ID) AS rn
FROM (
SELECT tab1.*
FROM mytable AS tab1
INNER JOIN (
SELECT ID, MAX(collection_time) AS second_time
FROM mytable GROUP BY ID
) AS tab2
ON tab1.ID=tab2.ID AND tab1.collection_time=tab2.second_time
)
)
WHERE rn = 1
And with Standard SQL (proposed by S.Mohsen sh)
WITH myTable AS (
SELECT 1 AS ID, 1 AS collection_time
),
tab1 AS (
SELECT ID,
MAX(collection_time) AS second_time
FROM myTable GROUP BY ID
),
tab2 AS (
SELECT * FROM myTable
),
joint AS (
SELECT tab2.*
FROM tab2 INNER JOIN tab1
ON tab2.ID=tab1.ID AND tab2.collection_time=tab1.second_time
)
SELECT * EXCEPT(rn)
FROM (
SELECT *, ROW_NUMBER() OVER (PARTITION BY ID) AS rn
FROM joint
)
WHERE rn=1
If you don't care about writing a piece of code for every column:
SELECT ID,
ARRAY_AGG(col1 ORDER BY collection_time DESC)[OFFSET(0)] AS col1,
ARRAY_AGG(col2 ORDER BY collection_time DESC)[OFFSET(0)] AS col2
FROM myTable
GROUP BY ID
I see no one has mentioned window functions with QUALIFY:
SELECT *, MAX(collection_time) OVER (PARTITION BY id) AS max_timestamp
FROM my_table
QUALIFY collection_time = max_timestamp
The window function adds a column max_timestamp that is accessible in the QUALIFY clause to filter on.
As per your comment, Considering you have a table with unique ID's for which you need to find latest collection_time. Here is another way to do it using Correlated Sub-Query. Give it a try.
SELECT id,
(SELECT Max(collection_time)
FROM mytable B
WHERE A.id = B.id) AS Max_collection_time
FROM id_table A
Another solution, which could be more scalable since it avoids multiple scans of the same table (which will happen with both self-join and correlated subquery in above answers). This solution only works with standard SQL (uncheck "Use Legacy SQL" option):
SELECT
ID,
(SELECT srow.*
FROM UNNEST(t.srows) srow
WHERE srow.collection_time = MAX(srow.collection_time))
FROM
(SELECT ID, ARRAY_AGG(STRUCT(col1, col2, col3, ...)) srows
FROM id_table
GROUP BY ID) t

How to achive this without using Sub Query, CTE and Prodedure

I have a table with 2 fields
CREATE TABLE Temp_tab
(
id int identity primary key,
value float
);
INSERT INTO Temp_tab(value)
VALUES (65.09),(17.09);
I want to select all the records that are greater than Avg(Value).
Say... Select * from temp_tab where value > (select avg(value) from temp_tab);
This above query(using subquery) gives me the expected output
1 65.09
I want to achieve this without using Sub Query, CTE and Prodedure, since i am using Spark DB. Spark Db does not support Sub Queries, CTE and Prodedures
You can do this quite painfully with a cross join and aggregation:
Select t1.id, t1.value
from temp_tab t1 cross join
temp_tab t2
group by t1.id, t1.value
having t1.value > avg(t2.value);
As a note: Spark SQL claims to support subqueries (see here). So, your original query should work. If it only supports subqueries in the from clause, then you can do:
Select t.*
from temp_tab t join
(select avg(value) as avgvalue from temp_tab) a
on t.value > a.avgvalue;
spark-sql accept this query under version of 1.6.x
select * from (select * from tenmin_history order by TS_TIME DESC limit 144) a order by TS_TIME
This query solved my problem.

Sum values from different tables

I read some topics about this but I'm not very good with sql. I have 10 tables with these fields:
value
type
date
I want to sum all the value fileds together when they have a specific type. I was trying to do something like this, but it's not working.
select sum(tab1.value) + sum(tab2.value)
from tab1, tab2
where tab1.type = tab2.type = 'box'
I guess I could do many simple queries like these and then sum all the results
select sum(value) from tab1 where type='box'
select sum(value) from tab2 where type='box'
but I wonder if I can do one single query
thanks
Having multiple tables with the same structure is usually a sign of poor database design.
I would suggest that you use your last approach, but put the subqueries in the from clause and then add the results in the select:
select t1.value + t2.value + . .
from (select sum(value) as value from tab1 where type='box') t1 cross join
(select sum(value) as value from tab2 where type='box') t2 cross join
. . .
Alternatively, you could union all them together in the from clause and then take the sum:
select sum(value)
from ((select sum(value) as value from tab1 where type='box') union all
(select sum(value) as value from tab2 where type='box') union all
. . .
) t;
If the tables are not linked via FK/PK you can use multiple sub-queries:
SELECT (SELECT SUM(tab1.value) FROM tab1 WHERE type='box') as Tab1Sum,
(SELECT SUM(tab2.value) FROM tab2 WHERE type='box') as Tab2Sum -- and so on...
This yields a single record where each column is the sum of each table.
1.Use single select;
DECLARE #type NVARCHAR(255) = N'Box';
SELECT (SELECT SUM(value) FROM tab1 WHERE type=#Box)
+ (SELECT SUM(value) FROM tab2 WHERE type=#Box)
+ (SELECT SUM(value) FROM tab3 WHERE type=#Box)
+ (...)
I think it's simplest one.
2.you create a view as
CREATE VIEW tabs
AS
SELECT value, type FROM tab1
UNION
SELECT value, type FROM tab2
UNION
SELECT value, type FROM tab3
UNION
...
Then
SELECT SUM(value) FROM tabs WHERE type = 'BOX'
3.Think why similar column are different tables. Can they be merged into single table?
If answer is No, and you have too many tables, consider concatenate SQL strings and use sp_executesql to execute it.