SQL select two rows in one table generating independant output - sql

I would like to generate an output, where I count two rows from the same table but with different conditions.
Now I have this SQL Statement which works:
select Datum, Count(ID), Count(Fläche)
FROM gustavo
where Fläche > 200
Group by Datum;
but it only gives me the sizes over 480 and the id's over 480 in both rows. In the first row I would like to have the count of all IDs though. Any idea how that would work?
Thanks a lot

Try this (not checked)
SELECT
Datum,
COUNT(ID) all_ID ,
SUM(if(ID>480) THEN 1 ELSE 0 END IF) ID_480,
Count(Fläche) all_Flache
FROM
gustavo
WHERE
Fläche > 200
GROUP BY
Datum;

Related

Most Efficient way to Search Massive Redshift Table for Duplicate Values

I have a large Redshift tables (hundreds of millions of ROWS with ~50 columns per row).
There is a need for me to find rows that have duplicate columns for a specific value.
Example:
if my table has the columns 'column_of_interest' and 'date_time', In those hundreds of millions of columns, I need to find all the instances where 'column_of_interest' has more than one value between a certain 'date_time'.
eg:
column_of_interest date_time
ROW 1: ABCD-1234 165895896565
ROW 2: FCEG-3434 165895896577
ROW 3: ABCD-1234 165895986688
ROW 4: ZZZZ-9999 165895986689
ROW 5: ZZZZ-9999 165895987790
in the above.. since ROW 1 and ROW 3 have the same column_of_interest i would like that column_of_interest returned. and ROW 4 and ROW 5 as well, so i would like those returned.
So the end result would be:
duplicates
ABCD-1234
ZZZZ-9999
I have found a few things online, but the table is so large, the query times about before any results are returned. Am I going about this the wrong way? Here are a couple that I tried just to get the results back (but they timeout before returning).
SELECT column_of_interest, COUNT(*)
FROM my_table
GROUP BY column_of_interest
HAVING COUNT(*) > 1
WHERE date_time >= 1601510400000 AND date_time < 1601596800000
LIMIT 200
SELECT a.*
FROM my_table a
JOIN (SELECT column_of_interest, COUNT(*)
FROM my_table
GROUP BY column_of_interest
HAVING count(*) > 1 ) b
ON a.column_of_interest = b.column_of_interest
ORDER BY a.column_of_interest
LIMIT 200
This should be a fine method. And it should not "time out". Your version has a syntax error.
So try:
SELECT column_of_interest, COUNT(*)
FROM my_table
WHERE date_time >= 1601510400000 AND date_time < 1601596800000
GROUP BY column_of_interest
HAVING COUNT(*) > 1
LIMIT 200

SQL: Select Top 2 Query is Excluding Records with more than 2 Records

I just joined after having a problem writing a query in MS Access. I am trying to write a query that will pull out the first two valid samples in from a list of replicated sample results and then would like to average the sample values. I have written a query that does pull samples with only two valid samples and averages these values. However, my query doesn't pull samples where there are more than two valid sample results. Here's my query:
SELECT temp_platevalid_table.samp_name AS samp_name, avg (temp_platevalid_table.mean_conc) AS fin_avg, count(temp_platevalid_table.samp_valid) AS sample_count
FROM Temp_PlateValid_table
WHERE (Temp_PlateValid_table.id In (SELECT TOP 2 S.id
FROM Temp_PlateValid_table as S
WHERE S.samp_name = S.samp_name and s.samp_valid=1 and S.samp_valid=1
ORDER BY ID))
GROUP BY Temp_PlateValid_table.samp_name
HAVING ((Count(Temp_PlateValid_table.samp_valid))=2)
ORDER BY Temp_PlateValid_table.samp_name;
Here's an example of what I'm trying to do:
ID Samp_Name Samp_Valid Mean_Conc
1 54d2d2 1 15
2 54d2d2 1 20
3 54d2d2 1 25
The average mean_conc should be 17.5, however, with my current query, I wouldn't receive a value at all for 54d2d2. Is there a way to tweak my query so that I get a value for samples that have more than two valid values? Please note that I'm using MS Access, so I don't think I can use fancier SQL code (partition by, etc.).
Thanks in advance for your help!
Is this what you want?
select pv.samp_name, avg(pv.value_conc)
from Temp_PlateValid_table pv
where pv.samp_valid = 1 and
pv.id in (select top 2 id
from Temp_PlateValid_table as pv2
where pv2.samp_name = pv.samp_name and pv2.samp_valid = 1
)
group by pv.samp_name;
You might need avg(pv.value_conc * 1.0).

Postgresql: Query to know which fraction of the values are larger/smaller

I would like to query my database to know which fraction/percentage of the elements of a table are larger/smaller than a given value.
For instance, let's say I have a table shopping_list with the following schema:
id integer
name text
price double precision
with contents:
id name price
1 banana 1
2 book 20
3 chicken 5
4 chocolate 3
I am now going to buy a new item with price 4, and I would like to know where this new item will be ranked in the shopping list. In this case the element will be greater than 50% of the elements.
I know I can run two queries and count the number of elements, e.g.:
-- returns = 4
SELECT COUNT(*)
FROM shopping_list;
-- returns = 2
SELECT COUNT(*)
FROM shopping_list
WHERE price > 4;
But I would like to do it with a single query to avoid post-processing the results.
if you just want them in single query use UNION
SELECT COUNT(*), 'total'
FROM shopping_list
UNION
SELECT COUNT(*),'greater'
FROM shopping_list
WHERE price > 4;
The simplest way is to use avg():
SELECT AVG( (price > 4)::float)
FROM shopping_list;
One way to get both results is as follows:
select count(*) as total,
(select count(*) from shopping_list where price > 4) as greater
from shopping_list
It will get both results in a single row, with the names you specified. It does, however, involve a query within a query.
I found the aggregate function PERCENT_RANK which does exactly what I wanted:
SELECT PERCENT_RANK(4) WITHIN GROUP (ORDER BY price)
FROM shopping_list;
-- returns 0.5

How to get three count values from same column using SQL in Access?

I have a table that has an integer column from which I am trying to get a few counts from. Basically I need four separate counts from the same column. The first value I need returned is the count of how many records have an integer value stored in this column between two values such as 213 and 9999, including the min and max values. The other three count values I need returned are just the count of records between different values of this column. I've tried doing queries like...
SELECT (SELECT Count(ID) FROM view1 WHERE ((MyIntColumn BETWEEN 213 AND 9999));)
AS Value1, (SELECT Count(ID) FROM FROM view1 WHERE ((MyIntColumn BETWEEN 500 AND 600));) AS Value2 FROM view1;
So there are for example, ten records with this column value between 213 and 9999. The result returned from this query gives me 10, but it gives me the same value of 10, 618 times which is the number of total records in the table. How would it be possible for me to only have it return one record of 10 instead?
Use the Iif() function instead of CASE WHEN
select Condition1: iif( ), condition2: iif( ), etc
P.S. : What I used to do when working with Access was have the iif() resolve to 1 or 0 and then do a SUM() to get the counts. Roundabout but it worked better with aggregation since it avoided nulls.
SELECT
COUNT(CASE
WHEN MyIntColumn >= 213 AND MyIntColumn <= 9999
THEN MyIntColumn
ELSE NULL
END) AS FirstValue
, ??? AS SecondValue
, ??? AS ThirdValue
, ??? AS FourthValue
FROM Table
This doesn't need nesting or CTE or anything. Just define via CASE your condition within COUNTs argument.
I dont really understand what You want in the second, third an fourth column. Sounds to me, its very similar to the first one.
Reformatted, your query looks like:
SELECT (
SELECT Count(ID)
FROM view1
WHERE MyIntColumn BETWEEN 213 AND 9999
) AS Value1
FROM view1;
So you are selecting a subquery expression that is not related to the outer query. For each row in view1, you calculate the number of rows in view1.
Instead, try to do the calculation once. You just have to remove your outer query:
SELECT Count(ID)
FROM view1
WHERE MyIntColumn BETWEEN 213 AND 9999;
OLEDB Connection in MS Access does not support key words CASE and WHEN .
You can only use iif() function to count two, three.. values in same columns
SELECT Attendance.StudentName, Count(IIf([Attendance]![Yes_No]='Yes',1,Null)) AS Yes, Count(IIf([Attendance]![Yes_No]='No',1,Null)) AS [No], Count(IIf([Attendance]![Yes_No]='Not',1,Null)) AS [Not], Count(IIf([Attendance]![Yes_No],1,Null)) AS Total
FROM Attendance
GROUP BY Attendance.StudentName;

Split a query result based on the result count

I have a query based on basic criteria that will return X number of records on any given day.
I'm trying to check the result of the basic query then apply a percentage split to it based on the total of X and split it in 2 buckets. Each bucket will be a percentage of the total query result returned in X.
For example:
Query A returns 3500 records.
If the number of records returned from Query A is <= 3000, then split the 3500 records into a 40% / 60% split (1,400 / 2,100).
If the number of records returned from Query A is >=3001 and <=50,000 then split the records into a 10% / 90% split.Etc. Etc.
I want the actual records returned, and not just the math acting on the records that returns one row with a number in it (in the column).
I'm not sure how you want to display different parts of the resulting set of rows, so I've just added additional column(part) in the resulting set of rows that contains values 1 indicating that row belongs to the first part and 2 - second part.
select z.*
, case
when cnt_all <= 3000 and cnt <= 40
then 1
when (cnt_all between 3001 and 50000) and (cnt <= 10)
then 1
else 2
end part
from (select t.*
, 100*(count(col1) over(order by col1) / count(col1) over() )cnt
, count(col1) over() cnt_all
from split_rowset t
order by col1
) z
Demo #1 number of rows 3000.
Demo #2 number of rows 3500.
For better usability you can create a view using the query above and then query that view filtering by part column.
Demo #3 using of a view.