SQL groupby multiple columns - sql

Tablename: EntryTable
ID CharityName Title VoteCount
1 save the childrens save them 1
2 save the childrens saving childrens 3
3 cancer research support them 10
Tablename: ContestantTable
ID FirstName LastName EntryId
1 Neville Vyland 1
2 Abhishek Shukla 1
3 Raghu Nandan 2
Desired output
CharityName FullName
save the childrens Neville Vyland
Abhishek Shukla
cancer research Raghu Nandan
I tried
select LOWER(ET.CharityName) AS CharityName,COUNT(CT.FirstName) AS Total_No_Of_Contestant
from EntryTable ET
join ContestantTable CT
on ET.ID = CT.ID
group by LOWER(ET.CharityName)
Please advice.

Please have a look at this sqlfiddle.
Have a try with this query:
SELECT
e.CharityName,
c.FirstName,
c.LastName,
sq.my_count
FROM
EntryTable e
INNER JOIN ContestantTable c ON e.ID = c.EntryId
INNER JOIN (
SELECT EntryId, COUNT(*) AS my_count FROM ContestantTable GROUP BY EntryId
) sq ON e.ID = sq.EntryId
I assumed you actually wanted to join with ContestantTable's EntryId column. It made more sense to me. Either way (joining my way or yours) your sample data is faulty.
Apart from that, you didn't want repeating CharityNames. That's not the job of SQL. The database is just there to store and retrieve the data. Not to format it nicely. You want to work with the data on application layer anyways. Removing repeating data doesn't make this job easier, it makes it worse.

Most people do not realize that T-SQL has some cool ranking functions that can be used with grouping. Many things like reports can be done in T-SQL.
The first part of the code below creates two local temporary tables and loads them with data for testing.
The second part of the code creates the report. I use two common table expressions (CTE). I could have used two more local temporary tables or table variables. It really does not matter with this toy example.
The cte_RankData has two columns RowNum and RankNum. If RowNum = RankNum, we are on the first instance of charity. Print out the charity name and the total number of votes. Otherwise, print out blanks.
The name of the contestant and votes for that contestant are show on the detail lines. This is a typical report with sub totals show at the top.
I think this matches the report output that you wanted. I ordered the contestants by most votes descending.
Sincerely
John Miner
www.craftydba.com
--
-- Create the tables
--
-- Remove the tables
drop table #tbl_Entry;
drop table #tbl_Contestants;
-- The entries table
Create table #tbl_Entry
(
ID int,
CharityName varchar(25),
Title varchar(25),
VoteCount int
);
-- Add data
Insert Into #tbl_Entry values
(1, 'save the childrens', 'save them', 1),
(2, 'save the childrens', 'saving childrens', 3),
(3, 'cancer research', 'support them', 10)
-- The contestants table
Create table #tbl_Contestants
(
ID int,
FirstName varchar(25),
LastName varchar(25),
EntryId int
);
-- Add data
Insert Into #tbl_Contestants values
(1, 'Neville', 'Vyland', 1),
(2, 'Abhishek', 'Shukla', 1),
(3, 'Raghu', 'Nandan', 2);
--
-- Create the report
--
;
with cte_RankData
as
(
select
ROW_NUMBER() OVER (ORDER BY E.CharityName ASC, VoteCount Desc) as RowNum,
RANK() OVER (ORDER BY E.CharityName ASC) AS RankNum,
E.CharityName as CharityName,
C.FirstName + ' ' + C.LastName as FullName,
E.VoteCount
from #tbl_Entry E inner join #tbl_Contestants C on E.ID = C.ID
),
cte_SumData
as
(
select
E.CharityName,
sum(E.VoteCount) as TotalCount
from #tbl_Entry E
group by E.CharityName
)
select
case when RowNum = RankNum then
R.CharityName
else
''
end as rpt_CharityName,
case when RowNum = RankNum then
str(S.TotalCount, 5, 0)
else
''
end as rpt_TotalVotes,
FullName as rpt_ContestantName,
VoteCount as rpt_Votes4Contestant
from cte_RankData R join cte_SumData S
on R.CharityName = S.CharityName

Related

Pivot top 100 rows into columns for all tables in a database

My mission is to get top 100 rows from all my tables in a database, and union them into one result output.
Since the tables have varying columncount and varying types, the only way I see to do this is to put the row number as columns, and the column name in the first column. Row 1 from all tables will then be represented in column 2 and so forth.
My skills in SQL aren't good enough to figure this one out, so I hope the community can assist me in this.
Here is some code:
--Example table
create table Worker (Id int, FirstName nvarchar(max));
--Some data
insert into Worker values (1,'John'),(2,'Jane'),(3,'Elisa');
--Static pivot example
select * from (select
ROW_NUMBER() over (order by Id) as IdRows, FirstName from worker) as st
pivot
(
max(FirstName) for IdRows in ([1],[2],[3])
) as pt;
--Code to incorporate to get column name on column 1
select name from sys.columns where object_id('Worker') = object_id;
--Cleanup
drop table Worker;
I reckon the static pivot must be dynamic, which is not a problem. The above code is just to create a proof of concept, so I have something to build further on.
End result of query should be like this:
Column 1 2 3
FirstName John Erina Jane
I hope this can be solved without using cursors and temp tables, but maybe that's the way to go?
EDIT:
Forgot to mention, I'm using sqlserver (mssql)
EDIT2:
I'm not all that good at explaining, so here is some more code to do that job for me.
This will add another column to Worker table, and a query to show the desired result. It's the query that has to be "smarter" so it can handle future added tables and columns added. (again, this is a proof of concept. The database has > 200 tables and > 1000 columns)
--Add a column
alter table Worker add LastName nvarchar(max);
--Add some data
update Worker set LastName = 'Smith' where Id = 1;
update Worker set LastName = 'Smith' where Id = 2;
update Worker set LastName = 'Smith' where Id = 3;
--Query to give desired output (top 100, but only 3 rows in this table)
select 'FirstName' as 'Column',* from (select top 100
ROW_NUMBER() over (order by Id) as IdRows, FirstName from worker) as st
pivot
(
max(FirstName) for IdRows in ([1],[2],[3])
) as pt
union
select 'LastName' as 'Column',* from (select top 100
ROW_NUMBER() over (order by Id) as IdRows, LastName from worker) as st
pivot
(
max(LastName) for IdRows in ([1],[2],[3])
) as pt
To solve this, I used temp table, and put it into a loop. Not a good solution, but it works. Had to cast all the data to nvarchar to make it work.

SQL 2 Table Query

Fairly new to the SQL area of coding and looking for some assistance.
Essentially have 2 tables 1 for employees and 1 for instore which is where their payment details are stored.
Looking to have a query which lists their name and payment rate which are stored across the two tables.
So far I have
SELECT paymentRate
FROM inStore
WHERE employeeID IN (SELECT employeeID
FROM employee
WHERE employeeID = 'C1234567')
Which gives me the result of the payment rate. I need to have the name displayed with it which is stored in the employee table. However after a while of troubleshooting I am having difficulty with something I am sure is quite simple. However when I try to change my query I keep getting assorted errors. Any assistance would be appreciated! Thanks in advance!
If I understood your query, you mean that you would like the employee name displayed when you do your query? If so you would want to use a Join as part of your sql query.
For example say that you want employee Bob Bobbins to show up you would need a query like the following:
SELECT i.paymentRate, e.FirstName, e.LastName FROM inStore as i
inner join employee as e on i.employeeID = e.employeeID
WHERE i.employeeID ='C1234567'
the as i and e are just ways to establish alias's so you wont need to keep retyping employee.firstname etc over and over again
hope that helps
edit: it would be good as well to have a foreign key relationship established as well for referential integrity purposes
Hope your InStore table contains multiple records against a single employee. Please refer the below sql query. This query is written in T-SQL.
DECLARE #Employee TABLE
(
Id INT,
Name VARCHAR(MAX)
)
DECLARE #InStore TABLE
(
Id INT,
Rate MONEY,
EmpId INT
)
INSERT INTO #Employee ( Id, Name )
VALUES ( 1,'ABC'),( 2,'DEF'),( 3,'GHI')
INSERT INTO #InStore( Id, Rate, EmpId )
VALUES ( 1, 10, 1 ), ( 2, 20, 2 ), ( 3, 30, 3 ),( 4, 40, 3 )
SELECT emp.Id,emp.Name,SUM(ins.Rate)
FROM #Employee emp
INNER JOIN #InStore ins ON ins.EmpId = emp.Id
GROUP BY emp.Id, emp.Name

Inner join and possibly a cte? Inner join and partial match in same row

I currently match #teacher to #coursesCSV using TeacherId, So with the INNER JOIN there's a one-one and I get one row. Once I get this match, I need to display the possible #coursesCsv.IsExpired for that particular TeacherId in that same row. So I match the first 3 chars and the last 4 chars, but ignore the 3 chars in the middle. With this criteria, there would only be two matches, and that's why the result displays 'OK/NOK'. The maximum number of matches here will be 2.
So the result should look like the following:
teacherid isexpired WhatMatched
ABC-001-1225 OK OK/NOK
If that's too difficult, another possible result would be a count:
teacherid isexpired WhatMatched
ABC-001-1225 OK 2
I've been trying get a '2' for WhatMatched but I keep getting 3. And I'mt stuck there. The important thing is that the result can only consist of 1 row.
The reason I'm doing this is that we have a grid that populates using #teacher.TeacherId inner join #coursesCSV, and this row is evaluated and approved by a user. In this case, he will naturally see 1 row: ABC-001-1225 and OK. The website will not let him approve because there's a NOK (ABC-002-1225). I'm adding this so that he knows he needs to check something instead of having to ask me why he can't approve since it says OK.
This is the query:
IF OBJECT_ID('tempdb..#teacher') IS NOT NULL DROP TABLE #teacher
IF OBJECT_ID('tempdb..#coursesCsv') IS NOT NULL DROP TABLE #coursesCsv
create table #teacher
(
TeacherID varchar(20),
FullName varchar(30),
DeptId int
)
insert into #teacher select 'ABC-001-1225', 'Roy Brown', 3
create table #coursesCsv
(
IsExpired varchar(3),
TeacherID varchar(20),
DeptId int
)
insert into #coursesCsv select 'OK', 'ABC-001-1225', 3
insert into #coursesCsv select 'NOK', 'ABC-002-1225', 3
insert into #coursesCsv select 'OK', 'XYZ-002-1225', 3
select t.teacherid, c.isexpired, c.coursecnt, c.prefix
from #teacher t
inner join
(
select
teacherid,
left(teacherid, 3) as 'Prefix',
isexpired,
count(*)
over (partition by right(teacherid,4)) as coursecnt
from #coursesCsv
) as c
on t.teacherid = c.teacherid
and left(t.teacherid, 3) = left(c.teacherid, 3)
I may not understand this 100% .... but I think you need to partition by the first 3 and last 4 characters of teacherid. So...
select t.teacherid, c.isexpired, c.coursecnt, c.prefix
from #teacher t
inner join
(
select
teacherid,
left(teacherid, 3) as 'Prefix',
isexpired,
count(*)
over (partition by left(teacherid, 3), right(teacherid,4)) as coursecnt
from #coursesCsv
) as c
on t.teacherid = c.teacherid
and left(t.teacherid, 3) = left(c.teacherid, 3)

SQL Left Join first match only

I have a query against a large number of big tables (rows and columns) with a number of joins, however one of tables has some duplicate rows of data causing issues for my query. Since this is a read only realtime feed from another department I can't fix that data, however I am trying to prevent issues in my query from it.
Given that, I need to add this crap data as a left join to my good query. The data set looks like:
IDNo FirstName LastName ...
-------------------------------------------
uqx bob smith
abc john willis
ABC john willis
aBc john willis
WTF jeff bridges
sss bill doe
ere sally abby
wtf jeff bridges
...
(about 2 dozen columns, and 100K rows)
My first instinct was to perform a distinct gave me about 80K rows:
SELECT DISTINCT P.IDNo
FROM people P
But when I try the following, I get all the rows back:
SELECT DISTINCT P.*
FROM people P
OR
SELECT
DISTINCT(P.IDNo) AS IDNoUnq
,P.FirstName
,P.LastName
...etc.
FROM people P
I then thought I would do a FIRST() aggregate function on all the columns, however that feels wrong too. Syntactically am I doing something wrong here?
Update:
Just wanted to note: These records are duplicates based on a non-key / non-indexed field of ID listed above. The ID is a text field which although has the same value, it is a different case than the other data causing the issue.
distinct is not a function. It always operates on all columns of the select list.
Your problem is a typical "greatest N per group" problem which can easily be solved using a window function:
select ...
from (
select IDNo,
FirstName,
LastName,
....,
row_number() over (partition by lower(idno) order by firstname) as rn
from people
) t
where rn = 1;
Using the order by clause you can select which of the duplicates you want to pick.
The above can be used in a left join, see below:
select ...
from x
left join (
select IDNo,
FirstName,
LastName,
....,
row_number() over (partition by lower(idno) order by firstname) as rn
from people
) p on p.idno = x.idno and p.rn = 1
where ...
Add an identity column (PeopleID) and then use a correlated subquery to return the first value for each value.
SELECT *
FROM People p
WHERE PeopleID = (
SELECT MIN(PeopleID)
FROM People
WHERE IDNo = p.IDNo
)
After careful consideration this dillema has a few different solutions:
Aggregate Everything
Use an aggregate on each column to get the biggest or smallest field value. This is what I am doing since it takes 2 partially filled out records and "merges" the data.
http://sqlfiddle.com/#!3/59cde/1
SELECT
UPPER(IDNo) AS user_id
, MAX(FirstName) AS name_first
, MAX(LastName) AS name_last
, MAX(entry) AS row_num
FROM people P
GROUP BY
IDNo
Get First (or Last record)
http://sqlfiddle.com/#!3/59cde/23
-- ------------------------------------------------------
-- Notes
-- entry: Auto-Number primary key some sort of unique PK is required for this method
-- IDNo: Should be primary key in feed, but is not, we are making an upper case version
-- This gets the first entry to get last entry, change MIN() to MAX()
-- ------------------------------------------------------
SELECT
PC.user_id
,PData.FirstName
,PData.LastName
,PData.entry
FROM (
SELECT
P2.user_id
,MIN(P2.entry) AS rownum
FROM (
SELECT
UPPER(P.IDNo) AS user_id
, P.entry
FROM people P
) AS P2
GROUP BY
P2.user_id
) AS PC
LEFT JOIN people PData
ON PData.entry = PC.rownum
ORDER BY
PData.entry
Use Cross Apply or Outer Apply, this way you can limit the amount of data to be joined from the table with the duplicates to the first hit.
Select
x.*,
c.*
from
x
Cross Apply
(
Select
Top (1)
IDNo,
FirstName,
LastName,
....,
from
people As p
where
p.idno = x.idno
Order By
p.idno //unnecessary if you don't need a specific match based on order
) As c
Cross Apply behaves like an inner join, Outer Apply like a left join
SQL Server CROSS APPLY and OUTER APPLY
Turns out I was doing it wrong, I needed to perform a nested select first of just the important columns, and do a distinct select off that to prevent trash columns of 'unique' data from corrupting my good data. The following appears to have resolved the issue... but I will try on the full dataset later.
SELECT DISTINCT P2.*
FROM (
SELECT
IDNo
, FirstName
, LastName
FROM people P
) P2
Here is some play data as requested: http://sqlfiddle.com/#!3/050e0d/3
CREATE TABLE people
(
[entry] int
, [IDNo] varchar(3)
, [FirstName] varchar(5)
, [LastName] varchar(7)
);
INSERT INTO people
(entry,[IDNo], [FirstName], [LastName])
VALUES
(1,'uqx', 'bob', 'smith'),
(2,'abc', 'john', 'willis'),
(3,'ABC', 'john', 'willis'),
(4,'aBc', 'john', 'willis'),
(5,'WTF', 'jeff', 'bridges'),
(6,'Sss', 'bill', 'doe'),
(7,'sSs', 'bill', 'doe'),
(8,'ssS', 'bill', 'doe'),
(9,'ere', 'sally', 'abby'),
(10,'wtf', 'jeff', 'bridges')
;
Try this
SELECT *
FROM people P
where P.IDNo in (SELECT DISTINCT IDNo
FROM people)
Depending on the nature of the duplicate rows, it looks like all you want is to have case-sensitivity on those columns. Setting the collation on these columns should be what you're after:
SELECT DISTINCT p.IDNO COLLATE SQL_Latin1_General_CP1_CI_AS, p.FirstName COLLATE SQL_Latin1_General_CP1_CI_AS, p.LastName COLLATE SQL_Latin1_General_CP1_CI_AS
FROM people P
http://msdn.microsoft.com/en-us/library/ms184391.aspx

Who to Insert data into ODD/EVEN rows only in SQL

I have one table with gender as one of the columns.
In gender column only M or F are allowed.
Now i want to sort the table so that while displaying the table in gender field M and F will come alternetivly.
I have Tried....
I have tried to create one(new) table with the same structure as my existing table.
Now using high leval insert i want to insert M to odd rows and F to even rows.
After that i want to join those two statements using union operator.
I am able to insert to ( new ) the table only male or female but not to the even or odd rows...
Can any body help me regarding this....
Thanks in Advance....
Don't consider a table to be "sorted". The SQL server may return the rows in any order depending on execution plan, index, joins etc. If you want a strict order you need to have an ordered column, like an identity column. Usually it is better to apply the desired sorting when selecting data.
However the interleaving of M and F is a little bit tricky, you need to use the ROW_NUMBER function.
Valid SQL Server code:
CREATE TABLE #GenderTable(
[Name] [nchar](10) NOT NULL,
[Gender] [char](1) NOT NULL
)
-- Create sample data
insert into #GenderTable (Name, Gender) values
('Adam', 'M'),
('Ben', 'M'),
('Casesar', 'M'),
('Alice', 'F'),
('Beatrice', 'F'),
('Cecilia', 'F')
SELECT * FROM #GenderTable
SELECT * FROM #GenderTable
order by ROW_NUMBER() over (partition by gender order by name), Gender
DROP TABLE #GenderTable
This gives the output
Name Gender
Adam M
Ben M
Casesar M
Alice F
Beatrice F
Cecilia F
and
Name Gender
Alice F
Adam M
Beatrice F
Ben M
Cecilia F
Casesar M
If you use another DBMS the syntax may differ.
I think the best way to do it would be to have two queries (one for M, one for F) and then join them together. The catch would be you would have to calculate the "rank" of each query and then sort accordingly.
Something like the following should do what you need:
select * from
(select
#rownum:=#rownum+1 rank,
t.*
from people_table t,
(SELECT #rownum:=0) r
where t.gender = 'M'
union
select
#rownum:=#rownum+1 rank,
t.*
from people_table t,
(SELECT #rownum:=0) r
where t.gender = 'F') joined
order by joined.rank, joined.gender;
If you are using SQL Server, you can seed your two tables with an IDENTITY column as follows. Make one odd and one even and then union and sort by this column.
Note that you can only truly alternate if there are the same number of male and female records. If there are more of one than the other, you will end up with non-alternating rows at the end.
CREATE TABLE MaleTable(Id INT IDENTITY(1,2) NOT NULL, Gender CHAR(1) NOT NULL)
INSERT INTO MaleTable(Gender) SELECT 'M'
INSERT INTO MaleTable(Gender) SELECT 'M'
INSERT INTO MaleTable(Gender) SELECT 'M'
CREATE TABLE FemaleTable(Id INT IDENTITY(2,2) NOT NULL, Gender CHAR(1) NOT NULL)
INSERT INTO FemaleTable(Gender) SELECT 'F'
INSERT INTO FemaleTable(Gender) SELECT 'F'
INSERT INTO FemaleTable(Gender) SELECT 'F'
SELECT u.Id
,u.Gender
FROM (
SELECT Id, Gender
FROM FemaleTable
UNION
SELECT Id, Gender
FROM MaleTable
) u
ORDER BY u.Id ASC
See here for a working example