SQL Identify records which occur more than once in the same year - sql

I have a records from which a set of Procedure codes should only occur once per year per member. I'm trying to identify occurrences where this rule is broken.
I've tried the below SQL, is that correct?
Table
+---------------+--------+-------------+
| ProcedureCode | Member | ServiceDate |
+---------------+--------+-------------+
| G0443 | 1234 | 01-03-2017 |
+---------------+--------+-------------+
| G0443 | 1234 | 05-03-2018 |
+---------------+--------+-------------+
| G0443 | 1234 | 07-03-2018 |
+---------------+--------+-------------+
| G0444 | 3453 | 01-03-2017 |
+---------------+--------+-------------+
| G0443 | 5676 | 07-03-2018 |
+---------------+--------+-------------+
Expected results where rule is broken
+---------------+--------+
| ProcedureCode | Member |
+---------------+--------+
| G0443 | 1234 |
+---------------+--------+
SQL
Select ProcedureCD, Mbr_Id
From CLAIMS
Where ProcedureCD IN ('G0443', 'G0444')
GROUP BY ProcedureCD,Mbr_Id, YEAR(ServiceFromDate)
having count(YEAR(ServiceFromDate))>1

The query you've written will work (if you correct the column names- your query uses different column names to the sample data you posted). It can be simplified visually by using COUNT(*) in the HAVING clause. COUNT works on any non null value and accumulates a 1 for non nulls, or 0 for nulls, but there isn't any significance to using YEAR inside the count in this case because all the dates are non null and count isn't interested in the value - count(*), count(1), count(0), count(member)would all work equally here
The only time count(column) works differently to count(*) is when column contains null values. There is also an option of COUNT where you put DISTINCT inside the brackets, and this causes the counting to ignore repeated values.
COUNT DISTINCT on a table column that contains 6 rows of values 1, 1, 2, null, 3, 3 would return 3 (3 unique values). COUNTing the same column would return 5 (5 non null values), COUNT(*) would return 6
You should understand that by putting the YEAR(...) in the group by but not the select, you might produce duplicate-looking rows in the output. For example if you had these rows also:
Member, Code, Date
1234, G0443, 1-1-19
1234, G0443, 2-1-19
And you're grouping on year (but not showing it) then you'll see:
1234, G0443 --it's for year 2018
1234, G0443 --it's for year 2019
Personally I think it'd be handy to show the year in the select list, so you can better pinpoint where the problem is, but if you want to squish these duplicate rows, do a SELECT DISTINCT Alternatively, leverage the difference between count and count distinct: remove the year from the GROUP BY and instead say HAVING COUNT(*) > COUNT(DISTINCT YEAR(ServiceDate))
As discussed above a count(*) will be greater than a count distinct year if there are duplicated years

Select ProcedureCode, Member,YEAR(ServiceDate) [Year],Count(*) Occurences
From CLAIMS
Where ProcedureCode IN ('G0443', 'G0444')
GROUP BY ProcedureCode, Member,YEAR(ServiceDate)
HAVING Count(*) > 1

Hope This code will help you
create table #temp (ProcedureCode varchar(20),Member varchar(20),ServiceDate Date)
insert into #temp (ProcedureCode,Member,ServiceDate) values ('G0443','1234','01-03-2017')
insert into #temp (ProcedureCode,Member,ServiceDate) values ('G0443','1234','05-03-2018 ')
insert into #temp (ProcedureCode,Member,ServiceDate) values ('G0443','1234','07-03-2018')
insert into #temp (ProcedureCode,Member,ServiceDate) values ('G0444','3453','01-03-2017')
insert into #temp (ProcedureCode,Member,ServiceDate) values ('G0443','5676','07-03-2018')
select ProcedureCode,Member from #temp
where YEAR(ServiceDate) in (Select year(ServiceDate) ServiceDate from #temp group by
ServiceDate having count(ServiceDate)>1)
and Member in (Select Member from #temp group by Member having count(Member)>1)
Group by ProcedureCode,Member
drop table #temp

Related

HQL, insert two rows if a condition is met

I have the following table called table_persons in Hive:
+--------+------+------------+
| people | type | date |
+--------+------+------------+
| lisa | bot | 19-04-2022 |
| wayne | per | 19-04-2022 |
+--------+------+------------+
If type is "bot", I have to add two rows in the table d1_info else if type is "per" i only have to add one row so the result is the following:
+---------+------+------------+
| db_type | info | date |
+---------+------+------------+
| x_bot | x | 19-04-2022 |
| x_bnt | x | 19-04-2022 |
| x_per | b | 19-04-2022 |
+---------+------+------------+
How can I add two rows if this condition is met?
with a Case When maybe?
You may try using a union to merge or duplicate the rows with bot. The following eg unions the first query which selects all records and the second query selects only those with bot.
Edit
In response to the edited question, I have added an additional parity column (storing 1 or 0) named original to differentiate the duplicate entry named
SELECT
p1.*,
1 as original
FROM
table_persons p1
UNION ALL
SELECT
p1.*,
0 as original
FROM
table_persons p1
WHERE p1.type='bot'
You may then insert this into your other table d1_info using the above query as a subquery or CTE with the desired transformations CASE expressions eg
INSERT INTO d1_info
(`db_type`, `info`, `date`)
WITH merged_data AS (
SELECT
p1.*,
1 as original
FROM
table_persons p1
UNION ALL
SELECT
p1.*,
0 as original
FROM
table_persons p1
WHERE p1.type='bot'
)
SELECT
CONCAT('x_',CASE
WHEN m1.type='per' THEN m1.type
WHEN m1.original=1 AND m1.type='bot' THEN m1.type
ELSE 'bnt'
END) as db_type,
CASE
WHEN m1.type='per' THEN 'b'
ELSE 'x'
END as info,
m1.date
FROM
merged_data m1
ORDER BY m1.people,m1.date;
See working demo db fiddle here
I think what you want is to create a new table that captures your logic. This would simplify your query and make it so you could easily add new types without having to edit logic of a case statement. It may also make it cleaner to view your logic later.
CREATE TABLE table_persons (
`people` VARCHAR(5),
`type` VARCHAR(3),
`date` VARCHAR(10)
);
INSERT INTO table_persons
VALUES
('lisa', 'bot', '19-04-2022'),
('wayne', 'per', '19-04-2022');
CREATE TABLE info (
`type` VARCHAR(5),
`db_type` VARCHAR(5),
`info` VARCHAR(1)
);
insert into info
values
('bot', 'x_bot', 'x'),
('bot', 'x_bnt', 'x'),
('per','x_per','b');
and then you can easily do a join:
select
info.db_type,
info.info,
persons.date date
from
table_persons persons inner join info
on
info.type = persons.type

How to add a total row at the end of the table in t-sql?

I need to add a row of sums as the last row of the table. For example:
book_name | some_row1 | some_row2 | sum
---------------+---------------+---------------+----------
book1 | some_data11 | some_data12 | 100
book2 | some_data21 | some_data22 | 300
book3 | some_data31 | some_data32 | 500
total_books=3 | NULL | NULL | 900
How can I do this? (T-SQL)
You can use union all :
select book_name, some_row1, some_row2, sum
from table t
union all
select cast(count(*) as varchar(255)), null, null, sum(sum)
from table t;
However, count(*) will give you no of rows available in table, if the book_name has null value also, then you need count(book_name) instead of count(*).
Try with ROLLUP
SELECT CASE
WHEN (GROUPING([book_name]) = 1) THEN 'total_books'
ELSE [book_name] END AS [book_name],some_row1, some_row2
,SUM(]sum]) as Total_Sales
From Before
GROUP BY
[book_name] WITH ROLLUP
I find that grouping sets is much more flexible than rollup. I would write this as:
select coalesce(book_name,
replace('total_books=#x', '#x', count(*))
) as book_name,
col2, col3, sum(whatever)
from t
group by grouping sets ( (book_name), () );
Strictly speaking, the GROUPING function with a CASE is better than COALESCE(). However, NULL values on the grouping keys is quite rare.

Count and name content from a SQL Server table

I have a table which is structured like this:
+-----+-------------+-------------------------+
| id | name | timestamp |
+-----+-------------+-------------------------+
| 1 | someName | 2016-04-20 09:41:41.213 |
| 2 | someName | 2016-04-20 09:42:41.213 |
| 3 | anotherName | 2016-04-20 09:43:41.213 |
| ... | ... | ... |
+-----+-------------+-------------------------+
Now, I am trying to create a query, which selects all timestamps since time x and count the amount of times the same name occurs in the result.
As an example, if we would apply this query to the table above, with 2016-04-20 09:40:41.213 as the date from which on it should be counted, the result should look like this:
+-------------+-------+
| name | count |
+-------------+-------+
| someName | 2 |
| anotherName | 1 |
+-------------+-------+
What I have accomplished so far is the following query, which gives me the the names, but not their count:
WITH screenshots AS
(
SELECT * FROM SavedScreenshotsLog
WHERE timestamp > '2016-04-20 09:40:241.213'
)
SELECT s.name
FROM SavedScreenshotsLog s
INNER JOIN screenshots sc ON sc.name = s.name AND sc.timestamp = s.timestamp
ORDER BY s.name
I have browsed through stackoverflow but was not able to find a solution which fits my needs and as I am not very experienced with SQL, I am out of ideas.
You mention one table in your question, and then show a query with two tables. That makes it hard to follow the question.
What you are asking for is a simple aggregation:
SELECT name, COUNT(*)
FROM SavedScreenshotsLog
WHERE timestamp > '2016-04-20 09:40:241.213'
GROUP BY name
ORDER BY COUNT(*) DESC;
EDIT:
If you want "0" values, you can use conditional aggregation:
SELECT name,
SUM(CASE WHEN timestamp > '2016-04-20 09:40:241.213' THEN 1 ELSE 0 END) as cnt
FROM SavedScreenshotsLog
GROUP BY name
ORDER BY cnt DESC;
Note that this will run slower because there is no filter on the dates prior to aggregation.
CREATE TABLE #TEST (name varchar(100), dt datetime)
INSERT INTO #TEST VALUES ('someName','2016-04-20 09:41:41.213')
INSERT INTO #TEST VALUES ('someName','2016-04-20 09:41:41.213')
INSERT INTO #TEST VALUES ('anotherName','2016-04-20 09:43:41.213')
declare #YourDatetime datetime = '2016-04-20 09:41:41.213'
SELECT name, count(dt)
FROM #TEST
WHERE dt >= #YourDatetime
GROUP BY name
I've posted the answer, because using the above query can generate errors in converting the string in where clause into a datetime, it depends on the format of the datetime.

SQL - group by both bits

I have an SQL table with one bit column. If only one bit value (in my case 1) occurs in the rows of the table, how can I make a SELECT statement which shows for, example, the occurance of both bit values in the table, even if the other does not occur? This is the result I'm trying to achieve:
+----------+--------+
| IsItTrue | AMOUNT |
+----------+--------+
| 1 | 12 |
| 0 | NULL |
+----------+--------+
I already tried to google the answer but without success, as English is not my native language and I am not that familiar with SQL jargon.
select IsItTrue, count(id) as Amount from
(select IsItTrue, id from table
union
select 1 as IsItTrue, null as id
union
select 0 as IsItTrue, null as id) t
group by bool

In SQL, what's the difference between count(column) and count(*)?

I have the following query:
select column_name, count(column_name)
from table
group by column_name
having count(column_name) > 1;
What would be the difference if I replaced all calls to count(column_name) to count(*)?
This question was inspired by How do I find duplicate values in a table in Oracle?.
To clarify the accepted answer (and maybe my question), replacing count(column_name) with count(*) would return an extra row in the result that contains a null and the count of null values in the column.
count(*) counts NULLs and count(column) does not
[edit] added this code so that people can run it
create table #bla(id int,id2 int)
insert #bla values(null,null)
insert #bla values(1,null)
insert #bla values(null,1)
insert #bla values(1,null)
insert #bla values(null,1)
insert #bla values(1,null)
insert #bla values(null,null)
select count(*),count(id),count(id2)
from #bla
results
7 3 2
Another minor difference, between using * and a specific column, is that in the column case you can add the keyword DISTINCT, and restrict the count to distinct values:
select column_a, count(distinct column_b)
from table
group by column_a
having count(distinct column_b) > 1;
A further and perhaps subtle difference is that in some database implementations the count(*) is computed by looking at the indexes on the table in question rather than the actual data rows. Since no specific column is specified, there is no need to bother with the actual rows and their values (as there would be if you counted a specific column). Allowing the database to use the index data can be significantly faster than making it count "real" rows.
The explanation in the docs, helps to explain this:
COUNT(*) returns the number of items in a group, including NULL values and duplicates.
COUNT(expression) evaluates expression for each row in a group and returns the number of nonnull values.
So count(*) includes nulls, the other method doesn't.
We can use the Stack Exchange Data Explorer to illustrate the difference with a simple query. The Users table in Stack Overflow's database has columns that are often left blank, like the user's Website URL.
-- count(column_name) vs. count(*)
-- Illustrates the difference between counting a column
-- that can hold null values, a 'not null' column, and count(*)
select count(WebsiteUrl), count(Id), count(*) from Users
If you run the query above in the Data Explorer, you'll see that the count is the same for count(Id) and count(*)because the Id column doesn't allow null values. The WebsiteUrl count is much lower, though, because that column allows null.
The COUNT(*) sentence indicates SQL Server to return all the rows from a table, including NULLs.
COUNT(column_name) just retrieves the rows having a non-null value on the rows.
Please see following code for test executions SQL Server 2008:
-- Variable table
DECLARE #Table TABLE
(
CustomerId int NULL
, Name nvarchar(50) NULL
)
-- Insert some records for tests
INSERT INTO #Table VALUES( NULL, 'Pedro')
INSERT INTO #Table VALUES( 1, 'Juan')
INSERT INTO #Table VALUES( 2, 'Pablo')
INSERT INTO #Table VALUES( 3, 'Marcelo')
INSERT INTO #Table VALUES( NULL, 'Leonardo')
INSERT INTO #Table VALUES( 4, 'Ignacio')
-- Get all the collumns by indicating *
SELECT COUNT(*) AS 'AllRowsCount'
FROM #Table
-- Get only content columns ( exluce NULLs )
SELECT COUNT(CustomerId) AS 'OnlyNotNullCounts'
FROM #Table
COUNT(*) – Returns the total number of records in a table (Including NULL valued records).
COUNT(Column Name) – Returns the total number of Non-NULL records. It means that, it ignores counting NULL valued records in that particular column.
Basically the COUNT(*) function return all the rows from a table whereas COUNT(COLUMN_NAME) does not; that is it excludes null values which everyone here have also answered here.
But the most interesting part is to make queries and database optimized it is better to use COUNT(*) unless doing multiple counts or a complex query rather than COUNT(COLUMN_NAME). Otherwise, it will really lower your DB performance while dealing with a huge number of data.
Further elaborating upon the answer given by #SQLMeance and #Brannon making use of GROUP BY clause which has been mentioned by OP but not present in answer by #SQLMenace
CREATE TABLE table1 (
id INT
);
INSERT INTO table1 VALUES
(1),
(2),
(NULL),
(2),
(NULL),
(3),
(1),
(4),
(NULL),
(2);
SELECT * FROM table1;
+------+
| id |
+------+
| 1 |
| 2 |
| NULL |
| 2 |
| NULL |
| 3 |
| 1 |
| 4 |
| NULL |
| 2 |
+------+
10 rows in set (0.00 sec)
SELECT id, COUNT(*) FROM table1 GROUP BY id;
+------+----------+
| id | COUNT(*) |
+------+----------+
| 1 | 2 |
| 2 | 3 |
| NULL | 3 |
| 3 | 1 |
| 4 | 1 |
+------+----------+
5 rows in set (0.00 sec)
Here, COUNT(*) counts the number of occurrences of each type of id including NULL
SELECT id, COUNT(id) FROM table1 GROUP BY id;
+------+-----------+
| id | COUNT(id) |
+------+-----------+
| 1 | 2 |
| 2 | 3 |
| NULL | 0 |
| 3 | 1 |
| 4 | 1 |
+------+-----------+
5 rows in set (0.00 sec)
Here, COUNT(id) counts the number of occurrences of each type of id but does not count the number of occurrences of NULL
SELECT id, COUNT(DISTINCT id) FROM table1 GROUP BY id;
+------+--------------------+
| id | COUNT(DISTINCT id) |
+------+--------------------+
| NULL | 0 |
| 1 | 1 |
| 2 | 1 |
| 3 | 1 |
| 4 | 1 |
+------+--------------------+
5 rows in set (0.00 sec)
Here, COUNT(DISTINCT id) counts the number of occurrences of each type of id only once (does not count duplicates) and also does not count the number of occurrences of NULL
It is best to use
Count(1) in place of column name or *
to count the number of rows in a table, it is faster than any format because it never go to check the column name into table exists or not
There is no difference if one column is fix in your table, if you want to use more than one column than you have to specify that how much columns you required to count......
Thanks,
As mentioned in the previous answers, Count(*) counts even the NULL columns, whereas count(Columnname) counts only if the column has values.
It's always best practice to avoid * (Select *, count *, …)