How to obtain the number of tuple returned by each SELECT of an UNION query in postgreSQL? - sql

I have a query consiting a union of two SELECT queries. Each SELECT query returns an info. I'm interested in obtaining the number of tuple returned by each SELECT query.
I want to do something like :
SELECT table1.info AS info, num_result_tuple()
FROM table1, table2
WHERE table1.info < table2.info
UNION
SELECT table3.info AS info, num_result_tuple()
FROM table3, table4
WHERE table3.info < table4.info
Where num_result_tuple() must represent the number of tuples found by each simple query.
is there such a function in postgresql ? or is there a another way to achieve this ?

You can use a window function for this:
SELECT table1.info AS info, count(*) over () as part_count
FROM table1, table2
WHERE table1.info < table2.info
UNION
SELECT table3.info AS info, count(*) over ()
FROM table3, table4
WHERE table3.info < table4.info;
You probably want UNION ALL instead of UNION. UNION will remove duplicate rows between the two select parts. If you know that you can't have any duplicates (or you do want to return them) UNION ALL will be faster.
If you need to remove duplicates from the original query, adding the count() will change the result. Imagine the following partial results:
First query:
info | part_count
-----+-----------
a | 1
Second query:
info | part_count
-----+-----------
a | 2
b | 2
The union would return
info | part_count
-----+-----------
a | 1
a | 2
b | 2
The query without the counts would have returned
info
----
a
b

Related

Split record into 2 records with distinct values based on a unique id

I have a table with some IDs that correspond to duplicate data that i would like to get rid of. They are linked by a groupid number. Currently my data looks like this:
|GroupID|NID1 |NID2 |
|S1 |644763|643257|
|T2 |4759 |84689 |
|W3 |96676 |585876|
In order for the software to run, I need the data in the following format:
|GroupID|NID |
|S1 |644763|
|S1 |643257|
|T2 |4759 |
|T2 |84689 |
|W3 |96676 |
|W3 |585876|
Thank you for your time.
You want union all :
select groupid, nid1 as nid
from table t
union all -- use "union" instead if you don't want duplicate rows
select groupid, nid2
from table t;
In Oracle 12C+, you can use lateral joins:
select t.groupid, v.nid
from t cross apply
(select t.nid1 as nid from dual union all
select t.nid2 as nid from dual
) v;
This is more efficient than union all because it only scans the table once.
You can also express this as:
select t.groupid,
(case when n.n = 1 then t.nid1 when n.n = 2 then t.nid2 end) as nid
from t cross join
(select 1 as n from dual union all select 2 from dual) n;
A little more complicated, but still only one scan of the table.

How to get substring for filter and group by clause in AWS Redshift database

How to get substring from column which contains records for filter and group by clause in AWS Redshift database.
I have table with records like:
Table_Id | Categories | Value
<ID> | ABC1; ABC1-1; XYZ | 10
<ID> | ABC1; ABC1-2; XYZ | 15
<ID> | XYZ | 5
.....
Now I want to filter records based on individual category like 'ABC1' or 'ABC1 and XYZ'
Expected output from query would like:
Table_Id | Categories | Value
<ID> | ABC1 | 25
<ID> | ABC1-1 | 10
<ID> | ABC1-2 | 15
<ID> | XYZ | 30
.....
So need to group results based on individual categories.
If you have at most 3 values in any "categories" cell you can unnest the cells, get the list of unique values and use that list in a join condition like this:
WITH
values as (
select distinct category
from (
select distinct split_part(categories,';',1) as category from your_table
union select distinct split_part(categories,';',2) from your_table
union select distinct split_part(categories,';',3) from your_table
)
where nullif(category,'') is not null
)
SELECT
t2.category
,sum(t1.value)
FROM your_table t1
JOIN values t2
ON split_part(categories,';',1)=t2.category
OR split_part(categories,';',2)=t2.category
OR split_part(categories,';',3)=t2.category
if you have more than 3 options just add another split_part level both in WITH part and the join condition
#JonScott, #AlexYes and other pals who struggle with similar kinda situations.
I found more better approach other than suggested by #AlexYes.
What I did, I flatter category column which result individual records.
Which I can further process.
Query:
select row_number() over(order by 1) as r1,
to_char(timestamptz 'epoch' + date_time * interval '1 second', 'yyyy-mm-dd') AS DAY,
split_part(categories, ';', numbers.n) as catg,
value
from <TABLE>
join numbers
on numbers.n <= regexp_count(category_string, ';') + 1 <OTHER_CONDITIONS>
Explanation:
Two functions are useful here: first, the split_part function, which takes a string, splits it on ';' delimiter, and returns the first, second, ... , nth value specified from the split string; second, regexp_count, which tells us how many times a particular pattern is found in our string.
To do this fully dynamically, you need to transpose or pivot values in "categories" column into separate rows.
Unfortunately, a "fully dynamic" solution (without knowing the different values beforehand) is NOT possible using redshift.
Your options are as follows:
Use the method suggested by AlexYes in another answer. This is
semi-dynamic and is probably your best option.
Outside of Redshift, run some ETL code to perform
the column -> multiple rows ETL.
Create a hardcoded type solution, and perform the pivot something like this:
select table_id,'ABC1' as category, case when concat(Categories,';') ilike '%ABC1;%' then value else 0 end as value from your_table
union all
select table_id,'ABC1-1' as category, case when concat(Categories,';')ilike '%ABC1-1;%' then value else 0 end as value from your_table
union all
etc

`INTERSECT` does not return anything from two tables, separately values are returned fine

I'm not sure what I am doing wrong here since I didn't touch SQL queries for several years plus MSSQL query language is a bit strange to me but after 30 minutes of googling I still cannot find the answer.
Problem
I have two queries that work perfectly fine:
SELECT COUNT(*) AS 'NumberOfAccounts' FROM Accounts
SELECT COUNT(*) AS 'NumberOfUsers' FROM Users
I need to get this information in one go in my API response since I don't want to execute two statements. How can I combine them into one query so it will return table as follows:
+------------------+---------------+
| NumberOfAccounts | NumberOfUsers |
+------------------+---------------+
| 10 | 16 |
+------------------+---------------+
What I have tried
UNION SELECT COUNT(*) AS 'NumberOfAccounts' FROM Accounts UNION SELECT COUNT(*) AS 'NumberOfUsers' FROM Users
This is giving me the result of both tables, however it all pushes it into NumberOfAccounts and the result is invalid for me to parse.
+------------------+
| NumberOfAccounts |
+------------------+
| 10 |
| 16 |
+------------------+
INTRSECT SELECT COUNT(*) AS 'NumberOfAccounts' FROM Accounts INTERSECT SELECT COUNT(*) AS 'NumberOfUsers' FROM Users
This just gives me empty result with only NumberOfAccounts column in it.
You can just put these as subqueries in a select:
SELECT (SELECT COUNT(*) FROM Accounts) as NumberOfAccounts,
(SELECT COUNT(*) FROM Users) as NumberOfUsers
In SQL Server, no FROM clause is needed.
UNION is the wrong usage here. Union will "merge" rows of identical tables (or identical selects) and not columns.
One solution might be:
SELECT AccountCount, UserCount FROM
(SELECT COUNT(*) AS AccountCount, 1 AS Id FROM Accounts) AS a
JOIN
(SELECT COUNT(*) AS UserCount, 1 as Id FROM Users) AS u ON (a.Id = u.Id)
Be aware of the artificial surrogate key 1 you need to insert to join both sub-selects together.
For completeness sake; with UNION ALL you'd do:
SELECT 'NumberOfAccounts' AS what, COUNT(*) AS howmany FROM accounts
UNION ALL
SELECT 'NumberOfUsers' AS what, COUNT(*) AS howmany FROM users;
which results in
+------------------+---------+
| what | howmany |
+------------------+---------+
| NumberOfAccounts | 10 |
| NumberOfUsers | 16 |
+------------------+---------+
And another variation:
WITH cte AS
(
SELECT COUNT(*) AS cntAccounts, 0 AS cntUsers FROM accounts
UNION ALL
SELECT 0 AS cntAccounts, COUNT(*) AS cntUsers FROM users
)
SELECT
SUM(cntAccounts) AS NumberOfAccounts
,SUM(cntUsers ) AS NumberOfUsers
FROM cte
If you want (need) better performance you can get the row counts from the following query which uses sys.dm_db_partition_stats to get the row counts:
SELECT (
SELECT SUM (row_count)
FROM sys.dm_db_partition_stats
WHERE object_id=OBJECT_ID('Accounts')
AND (index_id=0 or index_id=1)) NumberOfAccounts,
(
SELECT SUM (row_count)
FROM sys.dm_db_partition_stats
WHERE object_id=OBJECT_ID('Users')
AND (index_id=0 or index_id=1)) NumberOfUsers

SQL simplifying an except query

I have a database with around 50 million entries showing the status of a device for a given day, simplified to the form:
id | status
-------------
1 | Off
1 | Off
1 | On
2 | Off
2 | Off
3 | Off
3 | Off
3 | On
...
such that each id is guaranteed to have at least 2 rows with an 'off' status, but doesn't have to have an 'on' status. I'm trying to get a list of only the ids that do not have an 'On' status. For example, in the above data set I'd want a query returned with only '2'
The current query is:
SELECT DISTINCT id FROM table
EXCEPT
SELECT DISTINCT id FROM table WHERE status <> 'Off'
Which seems to work, but it's having to iterate over the entire table twice which ends up taking ~10-12 minutes to run per query. Is there a simpler way to do this with only a single query?
You can use WHERE NOT EXISTS instead:
Select Distinct Id
From Table A
Where Not Exists
(
Select *
From Table B
Where A.Id = B.Id
And B.Status = 'On'
)
I would also recommend looking at the indexes on the Status column. 10-12 minutes to run is excessively long. Even with 50m records, with proper indexing, a query like this shouldn't take longer than a second.
To add an index to the column, you can run this (I'm assuming SQL Server, your syntax may vary):
Create NonClustered Index Ix_YourTable_Status On YourTable (Status Asc);
You can use conditional aggregation.
select id
from table
group by id
having count(case when status='On' then 1 end)=0
You can use the help of a SELF JOIN ..
SELECT DISTINCT A.Id
FROM Table A
LEFT JOIN Table B ON A.Id=B.Id
WHERE B.Status='On'
AND B.Id IS NULL

How to get filtered records from Sql Server 2005?

Consider that I have two tables.
One is "Table1" as shown below.
One more table is "Table2" as shown below.
Now here what I need is, I need all the records from Table1 those ID's are not in Table2's Reference column.
Please guide me how to do this.
Thanks in advance.
How to do it with your current schema (impossible to use indexes):
SELECT Table1.*
FROM Table1
WHERE NOT EXISTS
(
SELECT 1
FROM Table2
WHERE CONCAT(',', Table2.Reference, ',') LIKE CONCAT('%,', Table1.ID, ',%')
)
How this works is by completely wrapping every value in the Reference column with commas. You will end up with ,2,3, and ,7,8,9, for your sample data. Then you can safely search for ,<Table1.ID>, within that string.
How to really do it:
Normalize your database, and get rid of those ugly, useless comma-separated lists.
Fix your table2 to be:
SlNo | Reference
------+-----------
1 | 2
1 | 3
2 | 7
2 | 8
2 | 9
and add a table2Names as:
SlNo | Name
------+---------
1 | Test
2 | Test 2
Then you can simply do:
SELECT Table1.*
FROM Table1
WHERE NOT EXISTS(SELECT 1 FROM Table2 WHERE Table2.Reference = Table1.ID)