getting count based on sum of values in other columns SQL - sql

PID | ChildPID | value
------|-----------|-------
3835 | 3934 | 1
3835 | 3935 | 0
3835 | 3936 | 0
3835 | 3939 | 1
3836 | 3940 | 0
3836 | 3941 | 0
3836 | 3942 | 0
and i need results like
PIDCountinvalue| Childcountinvalue | PIDCountoutvalue| Childcountoutvalue
---------------|--------------------|-----------------|-------------------|
1 | 2 | 1 | 5
means i need get count of PID,ChildPID based on sum of value corresponding those and if child Id belongs to one PID having complete values as 0 then only PID will get count in PIDCountoutvalue column or else if it's having >0 after summing up all ChildPID of that PID i will be considered as PIDCountinvalue, and coming to Childcount in/out it's just based on corresponding values.
explanation:
PIDCountinvalue| Childcountinvalue | PIDCountoutvalue| Childcountoutvalue
---------------|--------------------|-----------------|-------------------|
1(3835) | 2 (3934,3939) | 1 (3836) |5(3935,3936,3940,3941,3942)
there are total two PID: (3835,3836) and PID:3835 having 4 childids (3934,3935,3936,3939) (sum of values of childs > 0) and PID:3836 having 4 childids (3940,3941,3942) (sum of values of childs = 0), so if you sum the values of childIds under there respective PId's if that sum of the value count=0 then that corresponding PID will count as PIDCountoutvalue else it wil come under PIDCountinvalue like that 3836 is in outvalue count and 3835 is in value count.

Your explanations are not very clear and I cannot catch your logic, but here's an example code that can help you. You can easily modify it to fit your needs.
CREATE TABLE Table1
([PID] varchar(6), [ChildPID] varchar(11), [value] int)
;
INSERT INTO Table1
([PID], [ChildPID], [value])
VALUES
('3835', '3934', '1'),
('3835', '3935', '0'),
('3835', '3936', '0'),
('3835', '3939', '1'),
('3836', '3940', '0'),
('3836', '3941', '0'),
('3836', '3942', '0')
;
--You can use a common table expression to "build" a table and then to query what you need.
with
cte as
(
select PID, ChildPID, sum(value) [Sum]
from Table1
group by PID, ChildPID
)
--Now just build you columns with the aggregated data you need.
select
(select
count(distinct PID)
from cte
where [Sum] > 0) as PIDCountinvalue --Get count of PIDs with sum of values more than 0
,(select
count(ChildPID)
from cte
where [Sum] > 0) as Childcountinvalue --Get count of ChildPIDs that have values larger than 0
,(select sum(I.PIDs) from
(select
count(PID) PIDs
from cte
group by PID
having sum([Sum]) <= 0) I) as PIDCountoutvalue --Get count of PIDs with sum of value less than 0
,(select
count(ChildPID)
from cte
where [Sum] <= 0) as Childcountoutvalue --Get count of ChildPIDs that have values less than 0
The output of that query is exactly what you've wanted. However, since your logic is not clear to me, this may not be the final solution for your complete data. But I hope it helps at least a bit.

Related

How to pivot column data into a row where a maximum qty total cannot be exceeded?

Introduction:
I have come across an unexpected challenge. I'm hoping someone can help and I am interested in the best method to go about manipulating the data in accordance to this problem.
Scenario:
I need to combine column data associated to two different ID columns. Each row that I have associates an item_id and the quantity for this item_id. Please see below for an example.
+-------+-------+-------+---+
|cust_id|pack_id|item_id|qty|
+-------+-------+-------+---+
| 1 | A | 1 | 1 |
| 1 | A | 2 | 1 |
| 1 | A | 3 | 4 |
| 1 | A | 4 | 0 |
| 1 | A | 5 | 0 |
+-------+-------+-------+---+
I need to manipulate the data shown above so that 24 rows (for 24 item_ids) is combined into a single row. In the example above I have chosen 5 items to make things easier. The selection format I wish to get, assuming 5 item_ids, can be seen below.
+---------+---------+---+---+---+---+---+
| cust_id | pack_id | 1 | 2 | 3 | 4 | 5 |
+---------+---------+---+---+---+---+---+
| 1 | A | 1 | 1 | 4 | 0 | 0 |
+---------+---------+---+---+---+---+---+
However, here's the condition that is making this troublesome. The maximum total quantity for each row must not exceed 5. If the total quantity exceeds 5 a new row associated to the cust_id and pack_id must be created for the rest of the item_id quantities. Please see below for the desired output.
+---------+---------+---+---+---+---+---+
| cust_id | pack_id | 1 | 2 | 3 | 4 | 5 |
+---------+---------+---+---+---+---+---+
| 1 | A | 1 | 1 | 3 | 0 | 0 |
| 1 | A | 0 | 0 | 1 | 0 | 0 |
+---------+---------+---+---+---+---+---+
Notice how the quantities of item_ids 1, 2 and 3 summed together equal 6. This exceeds the maximum total quantity of 5 for each row. For the second row the difference is created. In this case only item_id 3 has a single quantity remaining.
Note, if a 2nd row needs to be created that total quantity displayed in that row also cannot exceed 5. There is a known item_id limit of 24. But, there is no known limit of the quantity associated for each item_id.
Here's an approach which goes from left-field a bit.
One approach would have been to do a recursive CTE, building the rows one-by-one.
Instead, I've taken an approach where I
Create a new (virtual) table with 1 row per item (so if there are 6 items, there will be 6 rows)
Group those items into groups of 5 (I've called these rn_batches)
Pivot those (based on counts per item per rn_batch)
For these, processing is relatively simple
Creating one row per item is done using INNER JOIN to a numbers table with n <= the relevant quantity.
The grouping then just assigns rn_batch = 1 for the first 5 items, rn_batch = 2 for the next 5 items, etc - until there are no more items left for that order (based on cust_id/pack_id).
Here is the code
/* Data setup */
CREATE TABLE #Order (cust_id int, pack_id varchar(1), item_id int, qty int, PRIMARY KEY (cust_id, pack_id, item_id))
INSERT INTO #Order (cust_id, pack_id, item_id, qty) VALUES
(1, 'A', 1, 1),
(1, 'A', 2, 1),
(1, 'A', 3, 4),
(1, 'A', 4, 0),
(1, 'A', 5, 0);
/* Pivot results */
WITH Nums(n) AS
(SELECT (c * 100) + (b * 10) + (a) + 1 AS n
FROM (VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) A(a)
CROSS JOIN (VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) B(b)
CROSS JOIN (VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) C(c)
),
ItemBatches AS
(SELECT cust_id, pack_id, item_id,
FLOOR((ROW_NUMBER() OVER (PARTITION BY cust_id, pack_id ORDER BY item_id, N.n)-1) / 5) + 1 AS rn_batch
FROM #Order O
INNER JOIN Nums N ON N.n <= O.qty
)
SELECT *
FROM (SELECT cust_id, pack_id, rn_batch, 'Item_' + LTRIM(STR(item_id)) AS item_desc
FROM ItemBatches
) src
PIVOT
(COUNT(item_desc) FOR item_desc IN ([Item_1], [Item_2], [Item_3], [Item_4], [Item_5])) pvt
ORDER BY cust_id, pack_id, rn_batch;
And here are results
cust_id pack_id rn_batch Item_1 Item_2 Item_3 Item_4 Item_5
1 A 1 1 1 3 0 0
1 A 2 0 0 1 0 0
Here's a db<>fiddle with
additional data in the #Orders table
the answer above, and also the processing with each step separated.
Notes
This approach (with the virtual numbers table) assumes a maximum of 1,000 for a given item in an order. If you need more, you can easily extend that numbers table by adding additional CROSS JOINs.
While I am in awe of the coders who made SQL Server and how it determines execution plans in millisends, for larger datasets I give SQL Server 0 chance to accurately predict how many rows will be in each step. As such, for performance, it may work better to split the code up into parts (including temp tables) similar to the db<>fiddle example.

SQL Identify records which occur more than once in the same year

I have a records from which a set of Procedure codes should only occur once per year per member. I'm trying to identify occurrences where this rule is broken.
I've tried the below SQL, is that correct?
Table
+---------------+--------+-------------+
| ProcedureCode | Member | ServiceDate |
+---------------+--------+-------------+
| G0443 | 1234 | 01-03-2017 |
+---------------+--------+-------------+
| G0443 | 1234 | 05-03-2018 |
+---------------+--------+-------------+
| G0443 | 1234 | 07-03-2018 |
+---------------+--------+-------------+
| G0444 | 3453 | 01-03-2017 |
+---------------+--------+-------------+
| G0443 | 5676 | 07-03-2018 |
+---------------+--------+-------------+
Expected results where rule is broken
+---------------+--------+
| ProcedureCode | Member |
+---------------+--------+
| G0443 | 1234 |
+---------------+--------+
SQL
Select ProcedureCD, Mbr_Id
From CLAIMS
Where ProcedureCD IN ('G0443', 'G0444')
GROUP BY ProcedureCD,Mbr_Id, YEAR(ServiceFromDate)
having count(YEAR(ServiceFromDate))>1
The query you've written will work (if you correct the column names- your query uses different column names to the sample data you posted). It can be simplified visually by using COUNT(*) in the HAVING clause. COUNT works on any non null value and accumulates a 1 for non nulls, or 0 for nulls, but there isn't any significance to using YEAR inside the count in this case because all the dates are non null and count isn't interested in the value - count(*), count(1), count(0), count(member)would all work equally here
The only time count(column) works differently to count(*) is when column contains null values. There is also an option of COUNT where you put DISTINCT inside the brackets, and this causes the counting to ignore repeated values.
COUNT DISTINCT on a table column that contains 6 rows of values 1, 1, 2, null, 3, 3 would return 3 (3 unique values). COUNTing the same column would return 5 (5 non null values), COUNT(*) would return 6
You should understand that by putting the YEAR(...) in the group by but not the select, you might produce duplicate-looking rows in the output. For example if you had these rows also:
Member, Code, Date
1234, G0443, 1-1-19
1234, G0443, 2-1-19
And you're grouping on year (but not showing it) then you'll see:
1234, G0443 --it's for year 2018
1234, G0443 --it's for year 2019
Personally I think it'd be handy to show the year in the select list, so you can better pinpoint where the problem is, but if you want to squish these duplicate rows, do a SELECT DISTINCT Alternatively, leverage the difference between count and count distinct: remove the year from the GROUP BY and instead say HAVING COUNT(*) > COUNT(DISTINCT YEAR(ServiceDate))
As discussed above a count(*) will be greater than a count distinct year if there are duplicated years
Select ProcedureCode, Member,YEAR(ServiceDate) [Year],Count(*) Occurences
From CLAIMS
Where ProcedureCode IN ('G0443', 'G0444')
GROUP BY ProcedureCode, Member,YEAR(ServiceDate)
HAVING Count(*) > 1
Hope This code will help you
create table #temp (ProcedureCode varchar(20),Member varchar(20),ServiceDate Date)
insert into #temp (ProcedureCode,Member,ServiceDate) values ('G0443','1234','01-03-2017')
insert into #temp (ProcedureCode,Member,ServiceDate) values ('G0443','1234','05-03-2018 ')
insert into #temp (ProcedureCode,Member,ServiceDate) values ('G0443','1234','07-03-2018')
insert into #temp (ProcedureCode,Member,ServiceDate) values ('G0444','3453','01-03-2017')
insert into #temp (ProcedureCode,Member,ServiceDate) values ('G0443','5676','07-03-2018')
select ProcedureCode,Member from #temp
where YEAR(ServiceDate) in (Select year(ServiceDate) ServiceDate from #temp group by
ServiceDate having count(ServiceDate)>1)
and Member in (Select Member from #temp group by Member having count(Member)>1)
Group by ProcedureCode,Member
drop table #temp

Determining total of consecutive values using SQL

I would like to determine the number of consecutive absences as per the following table. Initial research suggests I may be able to achieve this using a window function. For the data provided, the longest streak is four consecutive occurrences. Please can you advise how I can set a running absence total as a separate column.
create table events (eventdate date, absence int);
insert into events values ('2014-10-01', 0);
insert into events values ('2014-10-08', 1);
insert into events values ('2014-10-15', 1);
insert into events values ('2014-10-22', 0);
insert into events values ('2014-11-05', 0);
insert into events values ('2014-11-12', 1);
insert into events values ('2014-11-19', 1);
insert into events values ('2014-11-26', 1);
insert into events values ('2014-12-03', 1);
insert into events values ('2014-12-10', 0);
Based on Gordon Linhoff's answer here, you could do:
SELECT TOP 1
MIN(eventdate) AS spanStart ,
MAX(eventdate) AS spanEnd,
COUNT(*) AS spanLength
FROM ( SELECT e.* ,
( ROW_NUMBER() OVER ( ORDER BY eventdate )
- ROW_NUMBER() OVER ( PARTITION BY absence ORDER BY eventdate ) ) AS grp
FROM #events e
) t
GROUP BY grp ,
absence
HAVING absence = 1
ORDER BY COUNT(*) DESC;
Which returns:
spanStart | spanEnd | spanLength
---------------------------------------
2014-11-12 |2014-12-03 | 4
You don't specify which RDBMS you are using, but the following works under postgresql's window functions and should be translatable to similar SQL engines:
SELECT eventdate,
absence,
-- XXX We take advantage of the fact that absence is an int (1 or 0)
-- otherwise we'd COUNT(1) OVER (...) and only conditionally
-- display the count if absence = 1
SUM(absence) OVER (PARTITION BY span ORDER BY eventdate)
AS consecutive_absences
FROM (SELECT spanstarts.*,
SUM(newspan) OVER (ORDER BY eventdate) AS span
FROM (SELECT events.*,
CASE LAG(absence) OVER (ORDER BY eventdate)
WHEN absence THEN NULL
ELSE 1 END AS newspan
FROM events)
spanstarts
) eventsspans
ORDER BY eventdate;
which gives you:
eventdate | absence | consecutive_absences
------------+---------+----------------------
2014-10-01 | 0 | 0
2014-10-08 | 1 | 1
2014-10-15 | 1 | 2
2014-10-22 | 0 | 0
2014-11-05 | 0 | 0
2014-11-12 | 1 | 1
2014-11-19 | 1 | 2
2014-11-26 | 1 | 3
2014-12-03 | 1 | 4
2014-12-10 | 0 | 0
There is an excellent dissection of the above approach on the pgsql-general mailing list. The short of it is:
Innermost query (spanstarts) uses LAG to find the start of new
spans of absences, whether a span of 1's or a span 0's
Next query (eventsspans) identifies those spans by summing the number of new spans that have come before us. So, we find span 1, then span 2, then 3, etc.
The outer query the counts the number of absences in each span.
As the SQL comment says, we cheat a little bit on #3, taking advantage of its data type, but the net effect is the same.
I don't know what your DBMS is but this is from SQLServer. Hopefully it is of some help : )
-------------------------------------------------------------------------------------------
Query:
--tableRN is used to get the rownumber
;with tableRN as (SELECT a.*, ROW_NUMBER() OVER (ORDER BY a.event) as rn, COUNT(*) as maxRN
FROM absence a GROUP BY a.event, a.absence),
--cte is a recursive function that returns the...
--absence value, the level (amount of times 1 appeared in a row)
--rn (row number), total (total count
cte (absence, level, rn, total) AS (
SELECT 0, 0, 1, 0
UNION ALL
SELECT r.absence,
CASE WHEN c.absence = 1 AND r.absence = 1 THEN level + 1
ELSE 0
END,
c.rn + 1,
CASE WHEN c.level = 1 THEN total + 1
ELSE total
END
FROM cte c JOIN tableRN r ON c.rn + 1 = r.rn)
--This gets you the total count of times there
--was a consective absent (twice or more in a row).
SELECT MAX(c.total) AS Count FROM cte c
-------------------------------------------------------------------------------------------
Results:
|Count|
+-----+
| 2 |
Create a new column called consecutive_absence_count with default 0.
You may write a SQL procedure for insert - Fetch the latest record, retrieve the absence value, identify if the new record to be inserted has a present or an absent value.
If they latest and the new record have consecutive dates and absence value set to 0, increment the consecutive_absence_count else set it to 0.

Disassemble string, group, and reconstruct in Oracle SQL

So here is what a sample of my data look like:
ID | Amount
1111-1 | 5
1111-1 | -5
1111-2 | 5
1111-2 | -5
12R-1 | 8
12R-1 | -8
12R-3 | 8
12R-3 | -8
54A73-1| 2
54A73-1| -2
54A73-2| 2
54A73-2| -1
What I want to do is group by the string in the ID column before the dash, and find the group of IDs that have a sum of zero. The kicker is that after I find which group of IDs sum to zero, I want to add back the dash and number following the dash.
Here is what I hope the solution to look like:
ID | Amount
1111-1 | 5
1111-1 | -5
1111-2 | 5
1111-2 | -5
12R-1 | 8
12R-1 | -8
12R-3 | 8
12R-3 | -8
Notice how the IDs starting with 54A73 are not there anymore, its because the sum of their Amounts is not equal to zero.
Any help solving this questions would be much appreciated!
Here's one option joining the table back to itself after grouping by the beginning part of the id field using left and locate:
MySQL Version
select id, amount
from yourtable t
join (
select left(id, locate('-', id)-1) shortid
from yourtable
group by left(id, locate('-', id)-1)
having sum(amount) = 0
) t2 on left(t.id, locate('-', t.id)-1) = t2.shortid
SQL Fiddle Demo
Oracle Version
select id, amount
from yourtable t
join (
select substr(id, 0, instr(id,'-')-1) shortid
from yourtable
group by substr(id, 0, instr(id,'-')-1)
having sum(amount) = 0
) t2 on substr(t.id, 0, instr(t.id,'-')-1) = t2.shortid
More Fiddle

In SQL, what's the difference between count(column) and count(*)?

I have the following query:
select column_name, count(column_name)
from table
group by column_name
having count(column_name) > 1;
What would be the difference if I replaced all calls to count(column_name) to count(*)?
This question was inspired by How do I find duplicate values in a table in Oracle?.
To clarify the accepted answer (and maybe my question), replacing count(column_name) with count(*) would return an extra row in the result that contains a null and the count of null values in the column.
count(*) counts NULLs and count(column) does not
[edit] added this code so that people can run it
create table #bla(id int,id2 int)
insert #bla values(null,null)
insert #bla values(1,null)
insert #bla values(null,1)
insert #bla values(1,null)
insert #bla values(null,1)
insert #bla values(1,null)
insert #bla values(null,null)
select count(*),count(id),count(id2)
from #bla
results
7 3 2
Another minor difference, between using * and a specific column, is that in the column case you can add the keyword DISTINCT, and restrict the count to distinct values:
select column_a, count(distinct column_b)
from table
group by column_a
having count(distinct column_b) > 1;
A further and perhaps subtle difference is that in some database implementations the count(*) is computed by looking at the indexes on the table in question rather than the actual data rows. Since no specific column is specified, there is no need to bother with the actual rows and their values (as there would be if you counted a specific column). Allowing the database to use the index data can be significantly faster than making it count "real" rows.
The explanation in the docs, helps to explain this:
COUNT(*) returns the number of items in a group, including NULL values and duplicates.
COUNT(expression) evaluates expression for each row in a group and returns the number of nonnull values.
So count(*) includes nulls, the other method doesn't.
We can use the Stack Exchange Data Explorer to illustrate the difference with a simple query. The Users table in Stack Overflow's database has columns that are often left blank, like the user's Website URL.
-- count(column_name) vs. count(*)
-- Illustrates the difference between counting a column
-- that can hold null values, a 'not null' column, and count(*)
select count(WebsiteUrl), count(Id), count(*) from Users
If you run the query above in the Data Explorer, you'll see that the count is the same for count(Id) and count(*)because the Id column doesn't allow null values. The WebsiteUrl count is much lower, though, because that column allows null.
The COUNT(*) sentence indicates SQL Server to return all the rows from a table, including NULLs.
COUNT(column_name) just retrieves the rows having a non-null value on the rows.
Please see following code for test executions SQL Server 2008:
-- Variable table
DECLARE #Table TABLE
(
CustomerId int NULL
, Name nvarchar(50) NULL
)
-- Insert some records for tests
INSERT INTO #Table VALUES( NULL, 'Pedro')
INSERT INTO #Table VALUES( 1, 'Juan')
INSERT INTO #Table VALUES( 2, 'Pablo')
INSERT INTO #Table VALUES( 3, 'Marcelo')
INSERT INTO #Table VALUES( NULL, 'Leonardo')
INSERT INTO #Table VALUES( 4, 'Ignacio')
-- Get all the collumns by indicating *
SELECT COUNT(*) AS 'AllRowsCount'
FROM #Table
-- Get only content columns ( exluce NULLs )
SELECT COUNT(CustomerId) AS 'OnlyNotNullCounts'
FROM #Table
COUNT(*) – Returns the total number of records in a table (Including NULL valued records).
COUNT(Column Name) – Returns the total number of Non-NULL records. It means that, it ignores counting NULL valued records in that particular column.
Basically the COUNT(*) function return all the rows from a table whereas COUNT(COLUMN_NAME) does not; that is it excludes null values which everyone here have also answered here.
But the most interesting part is to make queries and database optimized it is better to use COUNT(*) unless doing multiple counts or a complex query rather than COUNT(COLUMN_NAME). Otherwise, it will really lower your DB performance while dealing with a huge number of data.
Further elaborating upon the answer given by #SQLMeance and #Brannon making use of GROUP BY clause which has been mentioned by OP but not present in answer by #SQLMenace
CREATE TABLE table1 (
id INT
);
INSERT INTO table1 VALUES
(1),
(2),
(NULL),
(2),
(NULL),
(3),
(1),
(4),
(NULL),
(2);
SELECT * FROM table1;
+------+
| id |
+------+
| 1 |
| 2 |
| NULL |
| 2 |
| NULL |
| 3 |
| 1 |
| 4 |
| NULL |
| 2 |
+------+
10 rows in set (0.00 sec)
SELECT id, COUNT(*) FROM table1 GROUP BY id;
+------+----------+
| id | COUNT(*) |
+------+----------+
| 1 | 2 |
| 2 | 3 |
| NULL | 3 |
| 3 | 1 |
| 4 | 1 |
+------+----------+
5 rows in set (0.00 sec)
Here, COUNT(*) counts the number of occurrences of each type of id including NULL
SELECT id, COUNT(id) FROM table1 GROUP BY id;
+------+-----------+
| id | COUNT(id) |
+------+-----------+
| 1 | 2 |
| 2 | 3 |
| NULL | 0 |
| 3 | 1 |
| 4 | 1 |
+------+-----------+
5 rows in set (0.00 sec)
Here, COUNT(id) counts the number of occurrences of each type of id but does not count the number of occurrences of NULL
SELECT id, COUNT(DISTINCT id) FROM table1 GROUP BY id;
+------+--------------------+
| id | COUNT(DISTINCT id) |
+------+--------------------+
| NULL | 0 |
| 1 | 1 |
| 2 | 1 |
| 3 | 1 |
| 4 | 1 |
+------+--------------------+
5 rows in set (0.00 sec)
Here, COUNT(DISTINCT id) counts the number of occurrences of each type of id only once (does not count duplicates) and also does not count the number of occurrences of NULL
It is best to use
Count(1) in place of column name or *
to count the number of rows in a table, it is faster than any format because it never go to check the column name into table exists or not
There is no difference if one column is fix in your table, if you want to use more than one column than you have to specify that how much columns you required to count......
Thanks,
As mentioned in the previous answers, Count(*) counts even the NULL columns, whereas count(Columnname) counts only if the column has values.
It's always best practice to avoid * (Select *, count *, …)