Consider the table fields as follows.
Appid Client_name is_real RTT
100 C1 1 1
200 C1 1 6
200 C2 1 7
100 C1 1 9
200 C1 0 7
Now I need total number of unique real Appid's in the table. We can say one appid record is real by if 'is_real' is 1.
In above table, we have only 3 real Appid's. Which are (100,C1), (200,C1) and (200, C2).
Postgesql command:
Select sum(r)
from (select count(is_real) as r from table group by Appid, Client_name) as t;
I don't want any recursive query. If you can fetch with single select query, it would be helpful.
Since you seem to define a unique id by (Appid, Client_name) (which is confusing, since you are mixing terms):
SELECT COUNT(DISTINCT (Appid, Client_name)) AS ct
FROM tbl
WHERE is_real = 1;
(Appid, Client_name) is a row-type expression, short for ROW(Appid, Client_name). Only distinct combinations are counted.
Another trick to get this done without subquery is to use a window function:
SELECT DISTINCT count(*) OVER () AS ct
FROM tbl
WHERE is_real = 1
GROUP BY Appid, Client_name;
But neither is going to be faster than using a subquery (which is not a recursive query):
SELECT count(*) AS ct
FROM (
SELECT 1
FROM tbl
WHERE is_real = 1
GROUP BY Appid, Client_name
) sub;
That's what I would use.
It's essential to understand the sequence of events in a SELECT query:
Best way to get result count before LIMIT was applied
total number of unique real Appid's in the table
I assume is_real is 1 = true, 0 = false.
SELECT COUNT(DISTINCT Appid)
FROM table
WHERE is_real = 1;
Related
my code is like :
SELECT
number,
name,
count(*) as "the number of correct answer"
FROM
table1 NATURAL JOIN table2
WHERE
answer = 'T'
GROUP BY
number,
name
HAVING
count(*) < avg(count(*))
ORDER BY
count(*);
Here I want to find the group with count less than the average number of count for each group, but here I failed to use HAVING or WHERE, could anyone help me?
How can I only select the 1 name1 2 since avg of count is (2+6+7)/3 = 5 and only 2 is less than avg.
number name count
1 name1 2
2 name2 6
3 name3 7
I would advise you to never use natural joins. They obfuscate the query and make the query a maintenance nightmore.
You can use window functions:
SELECT t.*
FROM (SELECT number, name,
COUNT(*) as num_correct,
AVG(COUNT(*)) OVER () as avg_num_correct
FROM table1 JOIN
table2
USING (?). -- be explicit about the column name
WHERE answer = 'T'
GROUP BY number, name
) t
WHERE num_correct < avg_num_correct;
As with your version of the query, this filters out all groups that have no correct answers.
I would place your current query logic into a CTE, and then tag on the average count in the process:
WITH cte AS (
SELECT number, name, COUNT(*) AS cnt,
AVG(COUNT(*)) OVER () AS avg_cnt
FROM table1
NATURAL JOIN table2
WHERE answer = 'T'
GROUP BY number, name
)
SELECT number, name, cnt AS count
FROM cte
WHERE cnt < avg_cnt;
Here we are using the AVG() function as an analytic function, with the window being the entire aggregated table. This means it will find the average of the counts per group, across all groups (after aggregation). Window functions (almost) always evaluate last.
I want to find the max value in a column
ID CName Tot_Val PName
--------------------------------
1 1 100 P1
2 1 10 P2
3 2 50 P2
4 2 80 P1
Above is my table structure. I just want to find the max total value only from the table. In that four row ID 1 and 2 have same value in CName but total val and PName has different values. What I am expecting is have to find the max value in ID 1 and 2
Expected result:
ID CName Tot_Val PName
--------------------------------
1 1 100 P1
4 2 80 P1
I need result same as like mention above
select Max(Tot_Val), CName
from table1
where PName in ('P1', 'P2')
group by CName
This is query I have tried but my problem is that I am not able to bring PName in this table. If I add PName in the select list means it will showing the rows doubled e.g. Result is 100 rows but when I add PName in selected list and group by list it showing 600 rows. That is the problem.
Can someone please help me to resolve this.
One possible option is to use a subquery. Give each row a number within each CName group ordered by Tot_Val. Then select the rows with a row number equal to one.
select x.*
from ( select mt.ID,
mt.CName,
mt.Tot_Val,
mt.PName,
row_number() over(partition by mt.CName order by mt.Tot_Val desc) as No
from MyTable mt ) x
where x.No = 1;
An alternative would be to use a common table expression (CTE) instead of a subquery to isolate the first result set.
with x as
(
select mt.ID,
mt.CName,
mt.Tot_Val,
mt.PName,
row_number() over(partition by mt.CName order by mt.Tot_Val desc) as No
from MyTable mt
)
select x.*
from x
where x.No = 1;
See both solutions in action in this fiddle.
You can search top-n-per-group for this kind of a query.
There are two common ways to do it. The most efficient method depends on your indexes and data distribution and whether you already have another table with the list of all CName values.
Using ROW_NUMBER
WITH
CTE
AS
(
SELECT
ID, CName, Tot_Val, PName,
ROW_NUMBER() OVER (PARTITION BY CName ORDER BY Tot_Val DESC) AS rn
FROM table1
)
SELECT
ID, CName, Tot_Val, PName
FROM CTE
WHERE rn=1
;
Using CROSS APPLY
WITH
CTE
AS
(
SELECT CName
FROM table1
GROUP BY CName
)
SELECT
A.ID
,A.CName
,A.Tot_Val
,A.PName
FROM
CTE
CROSS APPLY
(
SELECT TOP(1)
table1.ID
,table1.CName
,table1.Tot_Val
,table1.PName
FROM table1
WHERE
table1.CName = CTE.CName
ORDER BY
table1.Tot_Val DESC
) AS A
;
See a very detailed answer on dba.se Retrieving n rows per group
, or here Get top 1 row of each group
.
CROSS APPLY might be as fast as a correlated subquery, but this often has very good performance (and better than ROW_NUMBER():
select t.*
from t
where t.tot_val = (select max(t2.tot_val)
from t t2
where t2.cname = t.cname
);
Note: The performance depends on having an index on (cname, tot_val).
Table1 has the following 2 columns and 4 rows:
Entity Number
------ ------
Car 4
Shop 1
Apple 3
Pear 1
I'd like to have one set based SQL query, which produces the below desired results. Basically duplicating the Entities by the Number of times in the Number column.
I could only do it by loop through the rows one by one, which is not really elegant, neither set based.
Desired result:
Entity
------
Car
Car
Car
Car
Shop
Apple
Apple
Apple
Pear
One method uses recursive CTEs:
with cte as (
select t1.entity, t1.number
from table1 t1
union all
select cte.entity, cte.number - 1
from cte
where cte.number > 0
)
select entity
from cte;
Note: Using the default settings, this is limited to 100 rows per entity. You can use OPTION (MAXRECURSION 0) to get around this.
You can also solve this with a numbers table, but such a problem is a good introduction to recursive CTEs.
Use this
;WITH CTE
AS
(
SELECT
SeqNo = 1,
Entity,
Number
FROM YourTable
UNION ALL
SELECT
SeqNo = SeqNo+1,
Entity,
Number
FROM CTE
WHERE SeqNo < Number
)
SELECT
Entity
FROM CTE
ORDER BY 1
A non-recursion solution, will be using a fixed sequence number, then join the table based on this number like this:
WITH numbers
AS
(
SELECT n
FROM (VALUES(1),(2),(3),(4),(5),(6),(7),(8),(9), (10)) AS numbers(n)
)
SELECT t.Entity
FROM Table1 AS t
INNER JOIN numbers as n ON t.number >= n.n;
This will support up to 10 times duplication, you can add extra numbers to support extra duplication times.
Demo
You can use spt_values as source for numbers table
select EntityList.*
from EntityList
, (
select number as n from master..spt_values WHERE Type = 'P' and Number between 1 and (select max(number) from EntityList)
) t
where n <= number
order by entity
I have two tables, custassets and tags. To generate some test data I'd like to do an INSERT INTO a many-to-many table with a SELECT that gets random rows from each (so that a random primary key from one table is paired with a random primary key from the second). To my surprise this isn't as easy as I first thought, so I'm persisting with this to teach myself.
Here's my first attempt. I select 10 custassets and 3 tags, but both are the same in each case. I'd be fine with the first table being fixed, but I'd like to randomise the tags assigned.
SELECT
custassets_rand.id custassets_id,
tags_rand.id tags_rand_id
FROM
(
SELECT id FROM custassets WHERE defunct = false ORDER BY RANDOM() LIMIT 10
) AS custassets_rand
,
(
SELECT id FROM tags WHERE defunct = false ORDER BY RANDOM() LIMIT 3
) AS tags_rand
This produces:
custassets_id | tags_rand_id
---------------+--------------
9849 | 3322 }
9849 | 4871 } this pattern of tag PKs is repeated
9849 | 5188 }
12145 | 3322
12145 | 4871
12145 | 5188
17837 | 3322
17837 | 4871
17837 | 5188
....
I then tried the following approach: doing the second RANDOM() call in the SELECT column list. However this one was worse, as it chooses a single tag PK and sticks with it.
SELECT
custassets_rand.id custassets_id,
(SELECT id FROM tags WHERE defunct = false ORDER BY RANDOM() LIMIT 1) tags_rand_id
FROM
(
SELECT id FROM custassets WHERE defunct = false ORDER BY RANDOM() LIMIT 30
) AS custassets_rand
Result:
custassets_id | tags_rand_id
---------------+--------------
16694 | 1537
14204 | 1537
23823 | 1537
34799 | 1537
36388 | 1537
....
This would be easy in a scripting language, and I'm sure can be done quite easily with a stored procedure or temporary table. But can I do it just with a INSERT INTO SELECT?
I did think of choosing integer primary keys using a random function, but unfortunately the primary keys for both tables have gaps in the increment sequences (and so an empty row might be chosen in each table). That would have been fine otherwise!
Note that what you are looking for is not a Cartesian product, which would produce n*m rows; rather a random 1:1 association, which produces GREATEST(n,m) rows.
To produce truly random combinations, it's enough to randomize rn for the bigger set:
SELECT c_id, t_id
FROM (
SELECT id AS c_id, row_number() OVER (ORDER BY random()) AS rn
FROM custassets
) x
JOIN (SELECT id AS t_id, row_number() OVER () AS rn FROM tags) y USING (rn);
If arbitrary combinations are good enough, this is faster (especially for big tables):
SELECT c_id, t_id
FROM (SELECT id AS c_id, row_number() OVER () AS rn FROM custassets) x
JOIN (SELECT id AS t_id, row_number() OVER () AS rn FROM tags) y USING (rn);
If the number of rows in both tables do not match and you do not want to lose rows from the bigger table, use the modulo operator % to join rows from the smaller table multiple times:
SELECT c_id, t_id
FROM (
SELECT id AS c_id, row_number() OVER () AS rn
FROM custassets -- table with fewer rows
) x
JOIN (
SELECT id AS t_id, (row_number() OVER () % small.ct) + 1 AS rn
FROM tags
, (SELECT count(*) AS ct FROM custassets) AS small
) y USING (rn);
Window functions were added with PostgreSQL 8.4.
WITH a_ttl AS (
SELECT count(*) AS ttl FROM custassets c),
b_ttl AS (
SELECT count(*) AS ttl FROM tags),
rows AS (
SELECT gs.*
FROM generate_series(1,
(SELECT max(ttl) AS ttl FROM
(SELECT ttl FROM a_ttl UNION SELECT ttl FROM b_ttl) AS m))
AS gs(row)),
tab_a_rand AS (
SELECT custassets_id, row_number() OVER (order by random()) as row
FROM custassets),
tab_b_rand AS (
SELECT id, row_number() OVER (order by random()) as row
FROM tags)
SELECT a.custassets_id, b.id
FROM rows r
JOIN a_ttl ON 1=1 JOIN b_ttl ON 1=1
LEFT JOIN tab_a_rand a ON a.row = (r.row % a_ttl.ttl)+1
LEFT JOIN tab_b_rand b ON b.row = (r.row % b_ttl.ttl)+1
ORDER BY 1,2;
You can test this query on SQL Fiddle.
Here is a different approach to pick a single combination from 2 tables by random, assuming two tables a and b, both with primary key id. The tables needn't be of same size, and the second row is independently chosen from the first, which might not be that important for testdata.
SELECT * FROM a, b
WHERE a.id = (
SELECT id
FROM a
OFFSET (
SELECT random () * (SELECT count(*) FROM a)
)
LIMIT 1)
AND b.id = (
SELECT id
FROM b
OFFSET (
SELECT random () * (SELECT count(*) FROM b)
)
LIMIT 1);
Tested with two tables, one of size 7000 rows, one with 100k rows, result: immediately. For more than one result, you have to call the query repeatedly - increasing the LIMIT and changing x.id = to x.id IN would produce (aA, aB, bA, bB) result patterns.
It bugs me that after all these years of relational databases, there doesn't seem to be very good cross database ways of doing things like this. The MSDN article http://msdn.microsoft.com/en-us/library/cc441928.aspx seems to have some interesting ideas, but of course that's not PostgreSQL. And even then, their solution requires a single pass, when I'd think it ought to be able to be done without the scan.
I can imagine a few ways that might work without a pass (in selection), but it would involve creating another table that maps your table's primary keys to random numbers (or to linear sequences that you later randomly select, which in some ways may actually be better), and of course, that may have issues as well.
I realize this is probably a non-useful comment, I just felt I needed to rant a bit.
If you just want to get a random set of rows from each side, use a pseudo-random number generator. I would use something like:
select *
from (select a.*, row_number() over (order by NULL) as rownum -- NULL may not work, "(SELECT NULL)" works in MSSQL
from a
) a cross join
(select b.*, row_number() over (order by NULL) as rownum
from b
) b
where a.rownum <= 30 and b.rownum <= 30
This is doing a Cartesian product, which returns 900 rows assuming a and b each have at least 30 rows.
However, I interpreted your question as getting random combinations. Once again, I'd go for the pseudo-random approach.
select *
from (select a.*, row_number() over (order by NULL) as rownum -- NULL may not work, "(SELECT NULL)" works in MSSQL
from a
) a cross join
(select b.*, row_number() over (order by NULL) as rownum
from b
) b
where modf(a.rownum*107+b.rownum*257+17, 101) < <some vaue>
This let's you get combinations among arbitrary rows.
Just a plain carthesian product ON random() appears to work reasonably well. Simple comme bonjour...
-- Cartesian product
-- EXPLAIN ANALYZE
INSERT INTO dirgraph(point_from,point_to,costs)
SELECT p1.the_point , p2.the_point, (1000*random() ) +1
FROM allpoints p1
JOIN allpoints p2 ON random() < 0.002
;
I found a question that was very similar to this one, but using features that seem exclusive to Oracle. I'm looking to do this in SQL Server.
I have a table like this:
MyTable
--------------------
MyTableID INT PK
UserID INT
Counter INT
Each user can have multiple rows, with different values for Counter in each row. I need to find the rows with the highest Counter value for each user.
How can I do this in SQL Server 2005?
The best I can come up with is a query the returns the MAX(Counter) for each UserID, but I need the entire row because of other data in this table not shown in my table definition for simplicity's sake.
EDIT: It has come to my attention from some of the answers in this post, that I forgot an important detail. It is possible to have 2+ rows where a UserID can have the same MAX counter value. Example below updated for what the expected data/output should be.
With this data:
MyTableID UserID Counter
--------- ------- --------
1 1 4
2 1 7
3 4 3
4 11 9
5 11 3
6 4 6
...
9 11 9
I want these results for the duplicate MAX values, select the first occurance in whatever order SQL server selects them. Which rows are returned isn't important in this case as long as the UserID/Counter pairs are distinct:
MyTableID UserID Counter
--------- ------- --------
2 1 7
4 11 9
6 4 6
I like to use a Common Table Expression for that case, with a suitable ROW_NUMBER() function in it:
WITH MaxPerUser AS
(
SELECT
MyTableID, UserID, Counter,
ROW_NUMBER() OVER(PARTITION BY userid ORDER BY Counter DESC) AS 'RowNumber'
FROM dbo.MyTable
)
SELECT MyTableID, UserID, Counter
FROM MaxPerUser
WHERE RowNumber = 1
THat partitions the data over the UserID, orders it by Counter (descending) for each user, and then labels each of the rows starting with 1 for each user. Select only those rows with a 1 for rownumber and you have your max. values per user.
It's that easy :-) And I get results something like this:
MyTableID UserID Counter
2 1 7
6 4 6
4 11 9
Only one entry per user, no matter how many rows per user happen to have the same max value.
I think this will help you.
SELECT distinct(a.userid), MAX(a.counterid) as counterid
FROM mytable a INNER JOIN mytable b ON a.mytableid = b.mytableid
GROUP BY a.userid
There are several ways to do this, take a look at this Including an Aggregated Column's Related Values Several methods are shown including the performance differences
Here is one example
select t1.*
from(
select UserID, max(counter) as MaxCount
from MyTable
group by UserID) t2
join MyTable t1 on t2.UserID =t1.UserID
and t1.counter = t2.counter
Try this... I'm pretty sure this is the only way to truly make sure you get one row per User.
SELECT MT.*
FROM MyTable MT
INNER JOIN (
SELECT MAX(MID.MyTableId) AS MaxMyTableId,
MID.UserId
FROM MyTable MID
INNER JOIN (
SELECT MAX(Counter) AS MaxCounter, UserId
FROM MyTable
GROUP BY UserId
) AS MC
ON (MID.UserId = MC.UserId
AND MID.Counter = MC.MaxCounter)
GROUP BY MID.UserId
) AS MID
ON (MID.UserId = MC.UserId
AND MID.MyTableId = MC.MaxMyTableId)
select m.*
from MyTable m
inner join (
select UserID, max(Counter) as MaxCounter
from MyTable
group by UserID
) mm on m.UserID = mm.UserID and m.Counter = mm.MaxCounter