Return list of Students by ZipCode Count - sql

I am trying to get a list of students that live in the same zip code where zip code count > 1.
I tried the following and get nothing in my query. If I remove s.Student, I get results of zipcode and count, but I want to include student also.
SELECT s.Student, z.ZipCode, COUNT(s.ZipCodeId) As 'Zip Code Count'
FROM Students s
INNER JOIN ZipCodes z ON z.ZipCodeId = s.ZipCodeId
GROUP BY s.Student, z.ZipCode
HAVING COUNT(z.ZipCode) > 1
Below are the database tables I am using.
CREATE TABLE [dbo].[Instructors](
[InstructorId] [int] IDENTITY(1,1) NOT NULL,
[Instructor] [varchar](50) NOT NULL,
[ZipCodeId] [int] NOT NULL
) ON [PRIMARY]
GO
CREATE TABLE [dbo].[Students](
[StudentId] [int] IDENTITY(1,1) NOT NULL,
[Student] [varchar](50) NOT NULL,
[ZipCodeId] [int] NOT NULL
) ON [PRIMARY]
GO
CREATE TABLE [dbo].[ZipCodes](
[ZipCodeId] [int] IDENTITY(1,1) NOT NULL,
[ZipCode] [varchar](9) NULL,
[City] [varchar](50) NULL,
[State] [varchar](25) NULL
) ON [PRIMARY]

I think you need to query the Zip Codes which are used more than once, then join the Students on along with the Zip Code details e.g.
SELECT S.Student, Z.ZipCode, Z1.Num AS "Zip Code Count"
FROM (
SELECT COUNT(*) Num, ZipCodeId
FROM Students S
GROUP BY ZipCodeId
HAVING COUNT(*) > 1
) Z1
INNER JOIN Students S on S.ZipCodeId = Z1.ZipCodeId
INNER JOIN ZipCodes Z on Z.ZipCodeId = Z1.ZipCodeId;
Note: You don't use single quotes (') to delimit a column name - you use double quotes (") or square brackets ([]).
Also, sample data would allow testing of our solutions.

You can do this using a window function, without re-joining
SELECT
S.Student,
Z.ZipCode,
Z.Num AS [Zip Code Count]
FROM (
SELECT *,
COUNT(*) OVER (PARTITION BY S.ZipCodeId) Num
FROM Students S
) S
INNER JOIN ZipCodes Z on Z.ZipCodeId = S.ZipCodeId
WHERE S.Num > 1;

Related

How can join 2 Table when Key are in different columns

i have 2 Tables: accounthierarchy and accountvaluetotal.
the link between 2 Tables is account number. i want to join the table based on account number. But the account number of table "account hierarchy " is on different Level (column).
Can you please help me how to do it? Thanks
CREATE TABLE [dbo].[accounthierarchy](
[ID] [int] NULL,
[level1] [int] NULL,
[level2] [int] NULL,
[level3] [int] NULL,
[level4] [int] NULL,
[level5] [int] NULL)
INSERT INTO [dbo].[accounthierarchy] (ID,level1,Level2,level3,level4,level5)
VALUES
(1,100,null,null,null,null),
(2,100,110,null,null,null),
(3,100,110,1110,null,null),
(4,200,220,null,null,null),
(5,200,230,null,null,null),
(5,200,240,null,null,null),
(6,200,240,2410,null,null)
CREATE TABLE [dbo].[accountvaluetotal](
[accountnumber] [int] NULL,
[values] [int] NULL
)
insert into [dbo].[accountvaluetotal]
values
(1110,5000),
(220,7400),
(230,6200),
(2410,5600)
you can use INNER JOIN-
SELECT *
FROM accounthierarchy
INNER JOIN accountvaluetotal
ON level13=accountnumber;
On the basis that the account number is the 'last' value in the accounthierarchy table;
SELECT ID, COALESCE(ac.level5,ac.level4,ac.level3,ac.level2,ac.level1) as AccountNumber
from [accounthierarchy] ac
Will then give you the the account number, to which you can then do a standard join
COALESCE gives the first non-null value from a list of values, so by going from level 5 to level 1, it will return whichever valid value it arrives at first.

Add SQL SELECT COUNT result to an existing column as text

I have two tables:
1. TABLE [dbo].[ItemCategories](
[Id] [int] IDENTITY(1,1) NOT NULL,
[CategoryId] [int] NULL,
[StockId] [int] NULL,
2. TABLE [dbo].[Categories](
[Id] [int] IDENTITY(1,1) NOT NULL,
[ParentCategoryId] [int] NULL,
[CategoryName] [nvarchar](100) NULL,
[Slug] [nvarchar](150) NULL
And this query in SQL Server 2012
SELECT [CategoryName], [Slug], [ParentCategoryId], [Id]
FROM [Categories]
ORDER BY [ParentCategoryId] DESC
Which returns these rows
[CategoryName] [Slug] [ParentCategoryId] [Id]
Exercise exercise 42 46
Fashion fashion 42 47
And I have a second query:
SELECT COUNT(*)
FROM [ItemCategories]
WHERE CategoryId = '46' <--- This Id is the same as [Id] from the first query
How can I a modify the first query to add total count from the second query to the returned CategoryName column (as a single string) ?
Like this:
[CategoryName] [Slug] [ParentCategoryId] [Id]
Exercise (31) exercise 42 46
Fashion (56) fashion 42 47
I have created this join, but I don't know how to add the COUNT(*) as text
SELECT [CategoryName], [Slug], [ParentCategoryId], [Categories].[Id]
FROM [Categories]
INNER JOIN [ItemCategories] ON [Categories].[Id]=[ItemCategories].[CategoryId]
ORDER BY [ParentCategoryId] DESC
You can use the count(*) window function. I would put it in a separate column, but you can do:
SELECT [CategoryName] + ' (' + cast(count(*) over (partition by Id) as varchar(255)) + ')',
[Slug], [ParentCategoryId], [Id]
FROM [Categories]
ORDER BY [ParentCategoryId] DESC;
EDIT:
For two tables, use a JOIN and GROUP BY:
SELECT c.CategoryName + ' (' + cast(count(ic.Id) as varchar(255)) + ')',
c.Slug, c.ParentCategoryId, c.Id
FROM Categories c LEFT JOIN
ItemCategories ic
on ic.CategoryId = c.Id
GROUP BY c.CategoryName, c.slug, c.ParentCategoryId, c.id
ORDER BY ParentCategoryId DESC;

Query is very very slow for processing 200000 plus records

I have 200,000 rows in Patient & Person table, and the query shown takes 30 secs to execute.
I have defined the primary key (and clustered index) in the Person table on PersonId and on PatientId in the Patient table. What else can I do here to improve performance of my procedure?
New to database development side. I know only basic SQL. Also not sure SQL Server can handle 200,000 rows quickly.
Whole dynamic Procedure you can see at https://github.com/Padayappa/SQLProblem/blob/master/Performance
Anyone faced handling huge rows like this? How do I improve performance here?
DECLARE #return_value int,
#unitRows bigint,
#unitPages int,
#TenantId int,
#unitItems int,
#page int
SET #TenantId = 1
SET #unitItems = 20
SET #page = 1
DECLARE #PatientSearch TABLE(
[PatientId] [bigint] NOT NULL,
[PatientIdentifier] [nvarchar](50) NULL,
[PersonNumber] [nvarchar](20) NULL,
[FirstName] [nvarchar](100) NOT NULL,
[LastName] [nvarchar](100) NOT NULL,
[ResFirstName] [nvarchar](100) NOT NULL,
[ResLastName] [nvarchar](100) NOT NULL,
[AddFirstName] [nvarchar](100) NOT NULL,
[AddLastName] [nvarchar](100) NOT NULL,
[Address] [nvarchar](255) NULL,
[City] [nvarchar](50) NULL,
[State] [nvarchar](50) NULL,
[ZipCode] [nvarchar](20) NULL,
[Country] [nvarchar](50) NULL,
[RowNumber] [bigint] NULL
)
INSERT INTO #PatientSearch SELECT PAT.PatientId
,PAT.PatientIdentifier
,PER.PersonNumber
,PER.FirstName
,PER.LastName
,RES_PER.FirstName AS ResFirstName
,RES_PER.LastName AS ResLastName
,ADD_PER.FirstName AS AddFirstName
,ADD_PER.LastName AS AddLastName
,PER.Address
,PER.City
,PER.State
,PER.ZipCode
,PER.Country
,ROW_NUMBER() OVER (ORDER BY PAT.PatientId DESC) AS RowNumber
FROM dbo.Patient AS PAT
INNER JOIN dbo.Person AS PER
ON PAT.PersonId = PER.PersonId
INNER JOIN dbo.Person AS RES_PER
ON PAT.ResponsiblePersonId = RES_PER.PersonId
INNER JOIN dbo.Person AS ADD_PER
ON PAT.AddedBy = ADD_PER.PersonId
INNER JOIN dbo.Booking AS B
ON PAT.PatientId = B.PatientId
WHERE PAT.TenantId = #TenantId AND B.CategoryId = #CategoryId
GROUP BY PAT.PatientId
,PAT.PatientIdentifier
,PER.PersonNumber
,PER.FirstName
,PER.LastName
,RES_PER.FirstName
,RES_PER.LastName
,ADD_PER.FirstName
,ADD_PER.LastName
,PER.Address
,PER.City
,PER.State
,PER.ZipCode
,PER.Country
;
SELECT #unitRows = ##ROWCOUNT
,#unitPages = (#unitRows / #unitItems) + 1;
SELECT *
FROM #PatientSearch AS IT
WHERE RowNumber BETWEEN (#page - 1) * #unitItems + 1 AND #unitItems * #page
Well, unless I am missing something (like duplicate rows?) you should be able to remove the GROUP BY
GROUP BY PAT.PatientId
,PAT.PatientIdentifier
,PER.PersonNumber
,PER.FirstName
,PER.LastName
,RES_PER.FirstName
,RES_PER.LastName
,ADD_PER.FirstName
,ADD_PER.LastName
,PER.Address
,PER.City
,PER.State
,PER.ZipCode
,PER.Country
as you are grouping by all fields in the select list, and you are partitioning by PAT.PatientId
Further to that, you should create index on the tables with the index containing columns that you join/filter on.
So for instance I would create an index on table Patient with columns (TenantId,PersonId,ResponsiblePersonId,AddedBy) with included columns (PatientId,PatientIdentifier)
Frankly speaking, 200,000 rows is nothing to SQL server. Please first remove logic redundancy, like you have primary key, why still group so many columns, and why you need to join same table (person) 3 times? After removing logic redundancy, you need to create some composite index/include index at least. Get the execution plan (CTRL+M) or (CTRL+M), to see what index you missed. If you need further help, please paste your table schema with few rows of sample data.

Return rows which have same data in three columns

I have a table with the following schema
CREATE TABLE [dbo].[personas](
[id_persona] [int] IDENTITY(1,1) NOT NULL,
[nombres] [nvarchar](50) NOT NULL,
[apellido_paterno] [nvarchar](50) NULL,
[apellido_materno] [nvarchar](50) NULL,
[fecha_nacimiento] [date] NOT NULL,
[sexo] [varchar](1) NOT NULL,
[estado_civil] [nvarchar](50) NOT NULL,
[calle] [nvarchar](200) NULL,
[colonia] [nvarchar](100) NULL,
[codigo_postal] [char](5) NOT NULL,
[telefonos] [varchar](50) NULL,
[celular] [varchar](25) NULL,
[email] [varchar](50) NULL,
)
How do I make a query in SQL Server to return rows where nombre, apellido_paterno and apellido_materno are repeated? I mean two or more rows have the same data in these columns.
I suppose I'm looking something opposite to DISTINCT clause
You would want...
SELECT nombre, apellido_paterno, apellido_materno
FROM dbo.personas
GROUP BY nombre, apellido_paterno, apellido_materno
HAVING COUNT(*) > 1
If you want to look at the actual rows, then use that as an inner query and join onto it. So, something like
SELECT *
FROM personas pOuter INNER JOIN
(SELECT nombre, apellido_paterno, apellido_materno
FROM dbo.personas
GROUP BY nombre, apellido_paterno, apellido_materno
HAVING COUNT(*) > 1) pInner
ON pInner.nombre = pOuter.nombre
AND pInner.apellido_paterno = pOuter.apellido_paterno
AND pInner.apellido_materno = pOuter.apellido_materno
;WITH x AS
(
SELECT id_personas, rn = ROW_NUMBER() OVER
(
PARTITION BY nombre, apellido_paterno, apellido_materno
ORDER BY id_personas
)
FROM dbo.personas
)
SELECT <col list>
FROM dbo.personas AS p
WHERE EXISTS
(
SELECT 1 FROM x
WHERE x.id_personas = p.id_personas
AND x.rn > 1
);

what's the right way of joning two tables, group by a column, and select only one row for each record?

I have a crews table
CREATE TABLE crew(crew_id INT, crew_name nvarchar(20), )
And a time log table, which is just a very long list of actions performed by the crew
CREATE TABLE [dbo].[TimeLog](
[time_log_id] [int] IDENTITY(1,1) NOT NULL,
[experiment_id] [int] NOT NULL,
[crew_id] [int] NOT NULL,
[starting] [bit] NULL,
[ending] [bit] NULL,
[exception] [nchar](10) NULL,
[sim_time] [time](7) NULL,
[duration] [int] NULL,
[real_time] [datetime] NOT NULL )
I want to have a view that shows only one row for each crew with the latest sim_time + duration .
Is a view the way to go? If yes, how do I write it? If not, what's the best way of doing this?
Thanks
Here is a query to select what you want:
select * from (
select
*,
row_number() over (partition by c.crew_id order by l.sim_time desc) as rNum
from crew as c
inner join TileLog as l (on c.crew_id = l.crew_id)
) as t
where rNum = 1
it depends on what you need that data for.
anyway, a simple query to find latest sim time would be something like
select C.*, TL.sim_time
from crew C /*left? right? inner?*/ join TimeLog TL on TL.crew_id = C.crew.id
where TL.sim_time in (select max(timelog_subquery.sim_time) from TimeLog timelog_subquery where crew_id = C.crew_id )