SQL - count function not working correctly - sql

I'm trying to count the blood type for each blood bank I'm using oracle DB
the blood bank table is created like this
CREATE TABLE BloodBank (
BB_ID number(15),
BB_name varchar2(255) not NULL,
B_type varchar2(255),CONSTRAINT
blood_ty_pk FOREIGN KEY
(B_type) references BloodType(B_type),
salary number(15) not Null,
PRIMARY KEY (BB_ID)
);
INSERT INTO BloodBank (BB_ID,BB_name,b_type, salary)
VALUES (370,'new york Blood Bank','A+,A-,B+',12000);
INSERT INTO BloodBank (BB_ID,BB_name,b_type, salary)
VALUES (791,'chicago Blood Bank','B+,AB-,O-',90000);
INSERT INTO BloodBank (BB_ID,BB_name,b_type, salary)
VALUES (246,'los angeles Blood Bank','O+,A-,AB+',4500);
INSERT INTO BloodBank (BB_ID,BB_name,b_type, salary)
VALUES (360,'boston Blood Bank','A+,AB+',13000);
INSERT INTO BloodBank (BB_ID,BB_name,b_type, salary)
VALUES (510,'seattle Blood Bank','AB+,AB-,B+',2300);
select * from BloodBank;
when I use the count function
select count(B_type)
from bloodbank
group by BB_ID;
the result would be like this
so why the count function is not working correctly?
I'm trying to display each blood bank blood type count which is not only one in this case

I hope I don't get downvoted for solving the specific problem you're asking about, but this query would work:
select bb_id,
bb_name,
REGEXP_COUNT(b_type, ',')+1
from bloodbank;
However, this solution ignores a MAJOR issue with your data, which is that you do not normalize it as #Tim Biegeleisen correctly instructs you to do. The solution I've provided is EXTREMELY hacky in that it counts the commas in your string to determine the number of blood types. This is not at all reliable, and you should 100% do what Tim B recommends. But for the circumstances you find yourself in, this will tell you how many different blood types are kept at a specific blood bank.
http://sqlfiddle.com/#!4/8ed1c2/2

You should normalize your data and get each blood type value onto a separate record. That is, your starting data should look like this:
BB_ID | BB_name | b_type | salary
370 | new york Blood Bank | A+ | 12000
370 | new york Blood Bank | A- | 12000
370 | new york Blood Bank | A+ | 12000
... and so on
With this data model, the query you want is something along these lines:
SELECT BB_ID, BB_name, b_type, COUNT(*) AS cnt
FROM bloodbank
GROUP BY BB_ID, BB_name, b_type;
Or, if you want just counts of types across all bloodbanks, then use:
SELECT b_type, COUNT(*) AS cnt
FROM bloodbank
GROUP BY b_type;

Related

How to get the values corresponding to another table?

I'm new to SQL and am a bit confused on how I would write a query in order to get the count of state in a different table.
Ie i have this table [student]
id
school_code
0
0123
1
2345
2
2345
And this other table [school]
school_code
name
State
0123
xxyy
New Jersey
2345
xyxy
Washington
3456
yxyx
Colarado
I want to find out how I would get this table which tells me the entries for state by checking each student and making a count of how often that state occurs, ordered by most occurrences in student table.
State
No. times occured (iterating through student)
Washington
2
New Jersey
1
SELECT school.state, count(school.state)
FROM student, school
WHERE student.school_code = school.school_code
GROUP BY school.state
ORDER BY count(school.state)`
I'm not sure whether this would be iterating through each student and counting them?
Or just natural-joinging student and school and then counting all the states
When I run this on data supplied, the numbers of times occurred is a really low number which doesn't seem right?
We can simply JOIN the two tables and COUNT the school code in the students table, with GROUP BY state:
SELECT
sc.state, COUNT(st.school_code)
FROM
school sc
JOIN student st
ON sc.school_code = st.school_code
GROUP BY sc.state;
We can try out here: db<>fiddle

How to get the differences between two rows **and** the name of the field where the difference is, in BigQuery?

I have a table in BigQuery like this:
Name
Phone Number
Address
John
123456778564
1 Penny Lane
John
873452987424
1 Penny Lane
Mary
845704562848
87 5th Avenue
Mary
845704562848
54 Lincoln Rd.
Amy
342847327234
4 Ocean Drive Avenue
Amy
347907387469
98 Truman Rd.
I want to get a table with the differences between two consecutive rows and the name of the field where occurs the difference:
I mean this:
Name
Field
Before
After
John
Phone Number
123456778564
873452987424
Mary
Address
87 5th Avenue
54 Lincoln Rd.
Amy
Phone Number
342847327234
347907387469
Amy
Address
4 Ocean Drive Avenue
98 Truman Rd.
How can I do this ? I've looked on other posts but couldn't find something that corresponds to my need.
Thank you
Consider below BigQuery'ish solution
select Name, ['Phone Number', 'Address'][offset(offset)] Field,
prev_field as Before, field as After
from (
select timestamp, Name, offset, field,
lag(field) over (partition by Name, offset order by timestamp) as prev_field
from yourtable,
unnest([`Phone Number`, Address]) field with offset
)
where prev_field != field
if applied to sample data in your question - output is
As you can see here - no matter how many columns in your table that you need to compare - it is still just one query - no unions and such.
You just need to enumerate your columns in two places
['Phone Number', 'Address'][offset(offset)] Field
and
unnest([`Phone Number`, Address]) field with offset
Note: you can further refactor above using scripting's execute immediate to compose such lists within the query on the fly (check my other answers - I frequently use such technique in them)
One method is just use to use lag() and union all
select name, 'phone', prev_phone as before, phone as after
from (select name, phone,
lag(phone) over (partition by name order by timestamp) as prev_phone
from t
) t
where prev_phone <> phone
union all
select name, 'address', prev_address as before, address as afte4r
from (select name, address,
lag(address) over (partition by name order by timestamp) as prev_address
from t
) t
where prev_address <> address

Given account contributions: How to sum contributions per individual? In relation to a threshold?

Given contribution amounts per account, how do I 1)SUM the contributions made by each individual, 2)Find the number of people who have contributed <, =, or > $5,000?
Right now I have a database table "[dbo].[FakeRRSPs]" which looks like:
Account_ID
Personal_ID
Contributions
My current code gives the # of unique individuals successfully:
select distinct(personal_id), sum(contributions), count(account_id),
(select count(distinct(personal_id))
from [dbo].[FakeRRSPs]
)
from [dbo].[FakeRRSPs]
where personal_id is not null
group by personal_id
For example, there are 2M people holding 2.5M accounts.
Issues I face:
How do I count the number of individuals who contribute below, at, or
above the $5K threshold (after SUM(contribution) per person)
There are people who contribute $10K total for example, $5K in 2
accounts. Both accounts are picked up when I'm hoping to only capture
the SUM(Contribution) for this person.
I hope this is clear enough - it certainly isn't to me! Thanks everyone.
SQL Fiddle
MS SQL Server 2017 Schema Setup:
create table Contribution (PID int,AID int,C int)
insert into Contribution(PID,AID,C)VALUES(235,1245,1200)
insert into Contribution(PID,AID,C)VALUES(256,1246,0)
insert into Contribution(PID,AID,C)VALUES(256,1247,3500)
insert into Contribution(PID,AID,C)VALUES(256,1248,10000)
insert into Contribution(PID,AID,C)VALUES(421,1249,0)
Query 1:
select * from (select PID,sum(C) AS SC from Contribution
group by PID) as test
where test.SC<=5000
Results:
| PID | SC |
|-----|------|
| 235 | 1200 |
| 421 | 0 |

combine/merge two columns into 1 ( t-sql )

I have 2 tables: batters and pitchers
I want to pull
playerid(nvarchar50), firstname(nvarchar50), lastname(nvarchar50), bats(nvarchar50)
from the batters table
and
playerid(nvarchar50), firstname(nvarchar50), lastname(nvarchar50), throws(nvarchar50)
from the pitchers table
I want to combine the output so that when i get results it comes out like this
playerid, firstname, lastname, throws, bats
Is this possible? I'm guessing it should be but I've exhausted joins and unions and cant get the result set to come out that way. Remember they are two different tables
bat table
playerID nameFirst nameLast bats
----------------------------------------
abreubo01 Bobby Abreu L
abreujo02 Jose Abreu R
abreuto01 Tony Abreu B
ackledu01 Dustin Ackley L
adamecr01 Cristhian Adames S
adamsla01 Lane Adams R
adamsma01 Matt Adams L
pit table
playerid nameFirst nameLast throws
------------------------------------------
abadfe01 Fernando Abad L
aceveal01 Alfredo Aceves R
achteaj01 A.J. Achter R
adamsau01 Austin Adams R
adamsmi03 Mike Adams R
adcocna01 Nathan Adcock R
affelje01 Jeremy Affeldt L
Desired Result
pit table
playerid nameFirst nameLast throws Bats
------------------------------------------------
abadfe01 Fernando Abad
aceveal01 Alfredo Aceves
achteaj01 A.J. Achter
adamsau01 Austin Adams
adamsmi03 Mike Adams
adcocna01 Nathan Adcock
affelje01 Jeremy Affeldt
Assumptions:
1) You just want a list of all the players without any restrictions
2) The same player ID never appears in both tables (or if it does, you don't care if they're listed twice)
Based on those assumptions, you can simply write:
SELECT playerID, nameFirst, nameLast, bats, NULL as throws
FROM bats
UNION ALL
SELECT playerID, nameFirst, nameLast, NULL as bats, throws
FROM pits
However, I think your data is not fully normalised. Both of these tables are actually lists of players, just with a slight variation in their attributes. So a more sensible overall approach would be simply to have a single "players" table with columns as follows:
playerID, nameFirst, nameLast, bats, throws
Bats and throws would both be nullable, in case the player doesn't do that action. If necessary you could add an extra "player_type" column to denote their role (batter, pitcher - or both, if that's allowed).
Once you've got that structure, the query is trivial:
SELECT playerID, nameFirst, nameLast, bats, throws
FROM players

columns manipulation in fast load

Hello i am new to teradata. I am loading flat file into my TD DB using fast load.
My data set(CSV FILE) contains some issues like some of the rows in city column contains proper data but some of the rows contains NULL. The values of the city columns which contains NULL are stored into the next column which is zip code and so on. At the end some of the rows contains extra columns due to the extra NULL in rows. Examples is given below. How to resolve these kind of issues in fastload? Can someone answer this with SQL example?
City Zipcode country
xyz 12 Esp
abc 11 Ger
Null def(city's data) 12(zipcode's data) Por(country's data)
What about different approach. Instead of solving this in fast load, load your data to temporary table like DATABASENAME.CITIES_TMP with structure like below
City | zip_code | country | column4
xyz | 12 | Esp |
NULL | abc | 12 | Por
In next step create target table DATABASENAME.CITY with the structure
City | zip_code | country |
As a final step you need to run 2 INSERT queries:
INSERT INTO DATABASENAME.CITY (City, zip_code, country)
SELECT City, zip_code, country FROM DATABASENAME.CITIES_TMP
WHERE CITY not like 'NULL'/* WHERE CITY is not null - depends if null is a text value or just empty cell*/;
INSERT INTO DATABASENAME.CITY (City, zip_code, country)
SELECT Zip_code, country, column4 FROM DATABASENAME.CITIES_TMP
WHERE CITY like 'NULL' /* WHERE CITY is null - depends if null is a text value or just empty cell*/
Of course this will work if all your data looks exacly like in sample you provide.
This also will work only when you need to do this once in a while. If you need to load data few times a day it will be a litte cumbersome (not sure if I used proper word in this context) and then you should build some kind of ETL process with for example Talend tool.