any way to use IN and COUNT in rewriting this query? - sql

select count(patientNUM) as totalpatients
from [dbo] (nolock)
where patientId in (
'97210219',
'97210221',
'97210222'
)
50
100
20
So each patientsID contain numbers of patients, 100, 20 or 50. And I want to go through each patient rows and to lost them as partial or full. For example if there are 40 of 50 patientID rows, it will list as partial. If 50, it will list as full. Is there way to use count or in at the same time?
So basically I want to create two columns, patientID, and fullorpartial in the second column.
Is there way to go through each row and count each rows and then return and compare the result in a second column?

You need to know the "capacity" as well as the patientId. I would suggest a derived table:
select t.patientId,
(case when count(*) < v.capacity then 'partial'
when count(*) = v.capacity then 'full'
end) as full_or_partial
from t join
(values ('97210219', 50),
('97210221', 100),
('97210222', 20)
) v(patientId, capacity)
on v.patientId = t.patientId
group by t.patientId;

Since I don't know what exactly your data looks like and what you want. But try this:
use over is a good choice
select patientId,count(patientNUM) over(partition by patientId) as totalpatients
from [dbo] (nolock)
where patientId in (
'97210219',
'97210221',
'97210222'
)
this will count patientNUM exist times for each patientId.
And about the 'partialorfull' col I think can achieve by using case
case
when patientId = '97210219' and totalpatients < 50 then 'partial'
when ...... --condition keep going on
else 'full'
end as partialorfull

Related

filtering rows by checking a condition for group in one statement only

I have the following statement:
SELECT
(CONVERT(VARCHAR(10), f1, 120)) AS ff1,
CONVERT(VARCHAR(10), f2, 103) AS ff2,
...,
Bonus,
Malus,
ClientID,
FROM
my_table
WHERE
<my_conditions>
ORDER BY
f1 ASC
This select returns several rows for each ClientID. I have to filter out all the rows with the Clients that don't have any row with non-empty Bonus or Malus.
How can I do it by changing this select by one statement only and without duplicating all this select?
I could store the result in a #temp_table, then group the data and use the result of the grouping to filter the temp table. - BUT I should do it by one statement only.
I could perform this select twice - one time grouping it and then I can filter the rows based on grouping result. BUT I don't want to select it twice.
May be CTE (Common Table Expressions) could be useful here to perform the select one time only and to be able to use the result for grouping and then for selecting the desired result based on the grouping result.
Any more elegant solution for this problem?
Thank you in advance!
Just to clarify what the SQL should do I add an example:
ClientID Bonus Malus
1 1
1
1 1
2
2
3 4
3 5
3 1
So in this case I don't want the ClientID=2 rows to appear (they are not interesting). The result should be:
ClientID Bonus Malus
1 1
1
1 1
3 4
3 5
3 1
SELECT Bonus,
Malus,
ClientID
FROM my_table
WHERE ClientID not in
(
select ClientID
from my_table
group by ClientID
having count(Bonus) = 0 and count(Malus) = 0
)
A CTE will work fine, but in effect its contents will be executed twice because they are being cloned into all the places where the CTE is being used. This can be a net performance win or loss compared to using a temp table. If the query is very expensive it might come out as a loss. If it is cheap or if many rows are being returned the temp table will lose the comparison.
Which solution is better? Look at the execution plans and measure the performance.
The CTE is the easier, more maintainable are less redundant alternative.
You haven't specified what are data types of Bonus and Malus columns. So if they're integer (or can be converted to integer), then the query below should be helpful. It calculates sum of both columns for each ClientID. These sums are the same for each detail line of the same client so we can use them in WHERE condition. Statement SUM() OVER() is called "windowed function" and can't be used in WHERE clause so I had to wrap your select-list with a parent one just because of syntax.
SELECT *
FROM (
SELECT
CONVERT(VARCHAR(10), f1, 120) AS ff1,
CONVERT(VARCHAR(10), f2, 103) AS ff2,
...,
Bonus,
Malus,
ClientID,
SUM(Bonus) OVER (PARTITION BY ClientID) AS ClientBonusTotal,
SUM(Malus) OVER (PARTITION BY ClientID) AS ClientMalusTotal
FROM
my_table
WHERE
<my_conditions>
) a
WHERE ISNULL(a.ClientBonusTotal, 0) <> 0 OR ISNULL(a.ClientMalusTotal, 0) <> 0
ORDER BY f1 ASC

Use google bigquery to build histogram graph

How can write a query that makes histogram graph rendering easier?
For example, we have 100 million people with ages, we want to draw the histogram/buckets for age 0-10, 11-20, 21-30 etc... What does the query look like?
Has anyone done it? Did you try to connect the query result to google spreadsheet to draw the histogram?
You could also use the quantiles aggregation operator to get a quick look at the distribution of ages.
SELECT
quantiles(age, 10)
FROM mytable
Each row of this query would correspond to the age at that point in the list of ages. The first result is the age 1/10ths of the way through the sorted list of ages, the second is the age 2/10ths through, 3/10ths, etc.
See the 2019 update, with #standardSQL --Fh
The subquery idea works, as does "CASE WHEN" and then doing a group by:
SELECT COUNT(field1), bucket
FROM (
SELECT field1, CASE WHEN age >= 0 AND age < 10 THEN 1
WHEN age >= 10 AND age < 20 THEN 2
WHEN age >= 20 AND age < 30 THEN 3
...
ELSE -1 END as bucket
FROM table1)
GROUP BY bucket
Alternately, if the buckets are regular -- you could just divide and cast to an integer:
SELECT COUNT(field1), bucket
FROM (
SELECT field1, INTEGER(age / 10) as bucket FROM table1)
GROUP BY bucket
With #standardSQL and an auxiliary stats query, we can define the range the histogram should look into.
Here for the time to fly between SFO and JFK - with 10 buckets:
WITH data AS (
SELECT *, ActualElapsedTime datapoint
FROM `fh-bigquery.flights.ontime_201903`
WHERE FlightDate_year = "2018-01-01"
AND Origin = 'SFO' AND Dest = 'JFK'
)
, stats AS (
SELECT min+step*i min, min+step*(i+1)max
FROM (
SELECT max-min diff, min, max, (max-min)/10 step, GENERATE_ARRAY(0, 10, 1) i
FROM (
SELECT MIN(datapoint) min, MAX(datapoint) max
FROM data
)
), UNNEST(i) i
)
SELECT COUNT(*) count, (min+max)/2 avg
FROM data
JOIN stats
ON data.datapoint >= stats.min AND data.datapoint<stats.max
GROUP BY avg
ORDER BY avg
If you need round numbers, see: https://stackoverflow.com/a/60159876/132438
Using a cross join to get your min and max values (not that expensive on a single tuple) you can get a normalized bucket list of any given bucket count:
select
min(data.VAL) as min,
max(data.VAL) as max,
count(data.VAL) as num,
integer((data.VAL-value.min)/(value.max-value.min)*8) as group
from [table] data
CROSS JOIN (SELECT MAX(VAL) as max, MIN(VAL) as min, from [table]) value
GROUP BY group
ORDER BY group
in this example we're getting 8 buckets (pretty self explanatory) plus one for null VAL
Write a subquery like this:
(SELECT '1' AS agegroup, count(*) FROM people WHERE AGE <= 10 AND AGE >= 0)
Then you can do something like this:
SELECT * FROM
(SELECT '1' AS agegroup, count(*) FROM people WHERE AGE <= 10 AND AGE >= 0),
(SELECT '2' AS agegroup, count(*) FROM people WHERE AGE <= 20 AND AGE >= 10),
(SELECT '3' AS agegroup, count(*) FROM people WHERE AGE <= 120 AND AGE >= 20)
Result will be like:
Row agegroup count
1 1 somenumber
2 2 somenumber
3 3 another number
I hope this helps you. Of course in the age group you can write anything like: '0 to 10'
There is now the APPROX_QUANTILES aggregation function in standard SQL.
SELECT
APPROX_QUANTILES(column, number_of_bins)
...
I found gamars approach quite intriguing and expanded a little bit on it using scripting instead of the cross join. Notably, this approach also allows to consistently change group sizes, like here with group sizes that increase exponentially.
declare stats default
(select as struct min(new_confirmed) as min, max(new_confirmed) as max
from `bigquery-public-data.covid19_open_data.covid19_open_data`
where new_confirmed >0 and date = date "2022-03-07"
);
declare group_amount default 10; -- change group amount here
SELECT
CAST(floor(
(ln(new_confirmed-stats.min+1)/ln(stats.max-stats.min+1)) * (group_amount-1))
AS INT64) group_flag,
concat('[',min(new_confirmed),',',max(new_confirmed),']') as group_value_range,
count(1) as quantity
FROM `bigquery-public-data.covid19_open_data.covid19_open_data`
where new_confirmed >0 and date = date "2022-03-07"
GROUP BY group_flag
ORDER BY group_flag ASC
The basic approach is to label each value with its group_flag and then group by it. The flag is calculated by scaling the value down to a value between 0 and 1 and then scale it up again to 0 - group_amount.
I just took the log of the corrected value and range before their division to get the desired bias in group sizes. I also add 1 to make sure it doesn't try to take the log of 0.
You're looking for a single vector of information. I would normally query it like this:
select
count(*) as num,
integer( age / 10 ) as age_group
from mytable
group by age_group
A big case statement will be needed for arbitrary groups. It would be simple but much longer. My example should be fine if every bucket contains N years.
Take a look at the custom SQL functions. It works as
to_bin(10, [0, 100, 500]) => '... - 100'
to_bin(1000, [0, 100, 500, 0]) => '500 - ...'
to_bin(1000, [0, 100, 500]) => NULL
Read more here
https://github.com/AdamovichAleksey/BigQueryTips/blob/main/sql/functions/to_bins.sql
Any ideas and commits are welcomed

SQL add multiple "Count" together

I'm trying to add the counts together and output the one with the max counts.
The question is: Display the person with the most medals (gold as place = 1, silver as place = 2, bronze as place = 3)
Add all the medals together and display the person with the most medals
Below is the code I have thought about (obviously doesn't work)
Any ideas?
Select cm.Givenname, cm.Familyname, count(*)
FROM Competitors cm JOIN Results re ON cm.competitornum = re.competitornum
WHERE re.place between '1' and '3'
group by cm.Givenname, cm.Familyname
having max (count(re.place = 1) + count(re.place = 2) + count(re.place = 3))
Sorry forgot to add that were not allowed to use ORDER BY.
Some data in the table
Competitors Table
Competitornum GivenName Familyname gender Dateofbirth Countrycode
219153 Imri Daniel Male 1988-02-02 Aus
Results Table
Eventid Competitornum Place Lane Elapsedtime
SWM111 219153 1 2 20 02
From what you've described it sounds like you just need to take the "Top" individual in the total medal count. In order to do that you would write something like this.
Select top 1 cm.Givenname, cm.Familyname, count(*)
FROM Competitors cm JOIN Results re ON cm.competitornum = re.competitornum
WHERE re.place between '1' and '3'
group by cm.Givenname, cm.Familyname
order by count(*) desc
Without using order by you have a couple of other options though I'm glossing over whatever syntax peculiarities sqlfire may use.
You could determine the max medal count of any user and then only select competitors that have that count. You could do this by saving it out to a variable or using a subquery.
Select cm.Givenname, cm.Familyname, count(*)
FROM Competitors cm JOIN Results re ON cm.competitornum = re.competitornum
WHERE re.place between '1' and '3'
group by cm.Givenname, cm.Familyname
having count(*) = (
Select max( count(*) )
FROM Competitors cm JOIN Results re ON cm.competitornum = re.competitornum
WHERE re.place between '1' and '3'
group by cm.Givenname, cm.Familyname
)
Just a note here. This second method is highly inefficient because we recalculate the max medal count for every row in the parent table. If sqlfire supports it you would be much better served by calculating this ahead of time, storing it in a variable and using that in the HAVING clause.
You are grouping by re.place, is that what you want? You want the results per ... ? :)
[edit] Good, now that's fixed you're almost there :)
The having is not needed in this case, you simply need to add a count(re.EventID) to your select and make a subquery out of it with a max(that_count_column).

How return a count(*) of 0 instead of NULL

I have this bit of code:
SELECT Project, Financial_Year, COUNT(*) AS HighRiskCount
INTO #HighRisk
FROM #TempRisk1
WHERE Risk_1 = 3
GROUP BY Project, Financial_Year
where it's not returning any rows when the count is zero. How do I make these rows appear with the HighRiskCount set as 0?
You can't select the values from the table when the row count is 0. Where would it get the values for the nonexistent rows?
To do this, you'll have to have another table that defines your list of valid Project and Financial_Year values. You'll then select from this table, perform a left join on your existing table, then do the grouping.
Something like this:
SELECT l.Project, l.Financial_Year, COUNT(t.Project) AS HighRiskCount
INTO #HighRisk
FROM MasterRiskList l
left join #TempRisk1 t on t.Project = l.Project and t.Financial_Year = l.Financial_Year
WHERE t.Risk_1 = 3
GROUP BY l.Project, l.Financial_Year
Wrap your SELECT Query in an ISNULL:
SELECT ISNULL((SELECT Project, Financial_Year, COUNT(*) AS hrc
INTO #HighRisk
FROM #TempRisk1
WHERE Risk_1 = 3
GROUP BY Project, Financial_Year),0) AS HighRiskCount
If your SELECT returns a number, it will pass through. If it returns NULL, the 0 will pass through.
Assuming you have your 'Project' and 'Financial_Year' where Risk_1 is different than 3, and those are the ones you intend to include.
SELECT Project, Financial_Year, SUM(CASE WHEN RISK_1 = 3 THEN 1 ELSE 0 END) AS HighRiskCount
INTO #HighRisk
FROM #TempRisk1
GROUP BY Project, Financial_Year
Notice i removed the where part.
By the way, your current query is not returning null, it is returning no rows.
Use:
SELECT x.Project, x.financial_Year,
COUNT(y.*) AS HighRiskCount
INTO #HighRisk
FROM (SELECT DISTINCT t.project, t.financial_year
FROM #TempRisk1
WHERE t.Risk_1 = 3) x
LEFT JOIN #TempRisk1 y ON y.project = x.project
AND y.financial_year = x.financial_year
GROUP BY x.Project, x.Financial_Year
The only way to get zero counts is to use an OUTER join against a list of the distinct values you want to see zero counts for.
SQL generally has a problem returning the values that aren't in a table. To accomplish this (without a stored procedure, in any event), you'll need another table that contains the missing values.
Assuming you want one row per project / financial year combination, you'll need a table that contains each valid Project, Finanical_Year combination:
SELECT HR.Project, HR.Financial_Year, COUNT(HR.Risk_1) AS HighRiskCount
INTO #HighRisk HR RIGHT OUTER JOIN ProjectYears PY
ON HR.Project = PY.Project AND HR.Financial_Year = PY.Financial_Year
FROM #TempRisk1
WHERE Risk_1 = 3
GROUP BY HR.Project, HR.Financial_Year
Note that we're taking advantage of the fact that COUNT() will only count non-NULL values to get a 0 COUNT result for those result set records that are made up only of data from the new ProjectYears table.
Alternatively, you might only one 0 count record to be returned per project (or maybe one per financial_year). You would modify the above solution so that the JOINed table has only that one column.
Little longer, but what about this as a solution?
IF EXISTS (
SELECT *
FROM #TempRisk1
WHERE Risk_1 = 3
)
BEGIN
SELECT Project, Financial_Year, COUNT(*) AS HighRiskCount
INTO #HighRisk
FROM #TempRisk1
WHERE Risk_1 = 3
GROUP BY Project, Financial_Year
END
ELSE
BEGIN
INSERT INTO #HighRisk
SELECT 'Project', 'Financial_Year', 0
END
MSDN - ISNULL function
SELECT Project, Financial_Year, ISNULL(COUNT(*), 0) AS HighRiskCount
INTO #HighRisk
FROM #TempRisk1
WHERE Risk_1 = 3
GROUP BY Project, Financial_Year

Selecting distinct values for multiple columns

I have a table where many pieces of data match to one in another column, similar to a tree, and then data at the 'leaf' about each specific leaf
eg
Food Group Name Caloric Value
Vegetables Broccoli 100
Vegetables Carrots 80
Fruits Apples 120
Fruits Bananas 120
Fruits Oranges 90
I would like to design a query that will return only the distinct values of each column, and then nulls to cover the overflow
eg
Food group Name Caloric Value
Vegetables Broccoli 100
Fruit Carrots 80
Apples 120
Bananas 90
Oranges
I'm not sure if this is possible, right now I've been trying to do it with cases, however I was hoping there would be a simpler way
Seems like you are simply trying to have all the distinct values at hand. Why? For displaying purposes? It's the application's job, not the server's. You could simply have three queries like this:
SELECT DISTINCT [Food Group] FROM atable;
SELECT DISTINCT Name FROM atable;
SELECT DISTINCT [Caloric Value] FROM atable;
and display their results accordingly.
But if you insist on having them all in one table, you might try this:
WITH atable ([Food Group], Name, [Caloric Value]) AS (
SELECT 'Vegetables', 'Broccoli', 100 UNION ALL
SELECT 'Vegetables', 'Carrots', 80 UNION ALL
SELECT 'Fruits', 'Apples', 120 UNION ALL
SELECT 'Fruits', 'Bananas', 120 UNION ALL
SELECT 'Fruits', 'Oranges', 90
),
atable_numbered AS (
SELECT
[Food Group], Name, [Caloric Value],
fg_rank = DENSE_RANK() OVER (ORDER BY [Food Group]),
n_rank = DENSE_RANK() OVER (ORDER BY Name),
cv_rank = DENSE_RANK() OVER (ORDER BY [Caloric Value])
FROM atable
)
SELECT
fg.[Food Group],
n.Name,
cv.[Caloric Value]
FROM (
SELECT fg_rank FROM atable_numbered UNION
SELECT n_rank FROM atable_numbered UNION
SELECT cv_rank FROM atable_numbered
) r (rank)
LEFT JOIN (
SELECT DISTINCT [Food Group], fg_rank
FROM atable_numbered) fg ON r.rank = fg.fg_rank
LEFT JOIN (
SELECT DISTINCT Name, n_rank
FROM atable_numbered) n ON r.rank = n.n_rank
LEFT JOIN (
SELECT DISTINCT [Caloric Value], cv_rank
FROM atable_numbered) cv ON r.rank = cv.cv_rank
ORDER BY r.rank
I guess what I would want to know is why you need this in one result set? What does the code look like that would consume this result? The attributes on each row have nothing to do with each other. If you want to, say, build the contents of a set of drop-down boxes, you're better off doing these one at a time. In your requested result set, you'd need to iterate through the dataset three times to do anything useful, and you would need to either check for NULL each time or needlessly iterate all the way to the end of the dataset.
If this is in a stored procedure, couldn't you run three separate SELECT DISTINCT and return the values as three results. Then you can consume them one at a time, which is what you would be doing anyway I would guess.
If there REALLY IS a connection between the values, you could add each of the results to an array or list, then access all three lists in parallel using the index.
Something like this maybe?
select *
from (
select case
when row_number() over (partition by fruit_group) = 1 then fruit_group
else null
end as fruit_group,
case
when row_number() over (partition by name) = 1 then name
else null
end as name,
case
when row_number() over (partition by caloric) = 1 then caloric
else null
end as caloric
from your_table
) t
where fruit_group is not null
or name is not null
or caloric is not null
But I fail to see any sense in this