Generate rows where none exist - sql

I'm a little stumped on how to generate rows when none exist for specified conditions. Apologies for the formatting since I don't know how to write tables in SO posts, but let's say I have data that looks like this:
TimePeriodID CityspanSiteKey Mean_Name Mean
2 123 Social Environment 4
2 123 Youth with Adults 3.666666746
2 123 Youth with Peers 3.5
4 123 Social Environment 2.75
4 123 Youth with Adults 2.555555582
4 123 Youth with Peers 3.5
There are a few other Mean_Name values which I would like to include in every single time period ID, but just a Mean value of NULL, like the following:
TimePeriodID CityspanSiteKey Mean_Name Mean
2 123 Social Environment 4
2 123 Youth with Adults 3.666666746
2 123 Youth with Peers 3.5
2 123 Staff Build Relationships and Support Individual Youth NULL
2 123 Staff Positively Guide Behavior NULL
4 123 Social Environment 2.75
4 123 Youth with Adults 2.555555582
4 123 Youth with Peers 3.5
4 123 Staff Build Relationships and Support Individual Youth NULL
4 123 Staff Positively Guide Behavior NULL
5 123 Social Environment 2.75
5 123 Youth with Adults 2.555555582
5 123 Youth with Peers 3.5
5 123 Staff Build Relationships and Support Individual Youth NULL
5 123 Staff Positively Guide Behavior NULL
6 123 Social Environment NULL
6 123 Youth with Adults NULL
6 123 Youth with Peers NULL
6 123 Staff Build Relationships and Support Individual Youth NULL
6 123 Staff Positively Guide Behavior NULL
What's the best way to go about doing this? I don't think CASEing will be of much use since these records don't exist.

You seem to want a cross join and then left join. Not all values are in your original data, so you might as well construct them:
select ti.timeperiod, c.CityspanSiteKey, m.mean_name, t.mean
from (values (2), (4), (5), (6)
) ti(timeperiod) cross join
(values (123)
) c(CityspanSiteKey) cross join
(values ('Social Environment'), ('Youth with Adults'), ('Youth with Peers'), ('Staff Build Relationships and Support Individual Youth'), ('Staff Positively Guide Behavior')
) m(mean_name) left join
t
on t.timeperiod = ti.timeperiod and
t.CityspanSiteKey = c.CityspanSiteKey and
t.mean_name = m.mean_name;
You can use subqueries or existing tables instead of the values() clause.

Related

Postgres rank() without duplicates

I'm ranking race data for series of cycling events. Racers win various amounts of points for their position in races. I want to retain the discrete event scoring, but also rank the racer in the series. For example, considering a sub-query that returns this:
License #
Rider Name
Total Points
Race Points
Race ID
123
Joe
25
5
567
123
Joe
25
12
234
123
Joe
25
8
987
456
Ahmed
20
12
567
456
Ahmed
20
8
234
You can see Joe has 25 points, as he won 5, 12, and 8 points in three races. Ahmed has 20 points, as he won 12 and 8 points in two races.
Now for the ranking, what I'd like is:
Place
License #
Rider Name
Total Points
Race Points
Race ID
1
123
Joe
25
5
567
1
123
Joe
25
12
234
1
123
Joe
25
8
987
2
456
Ahmed
20
12
567
2
456
Ahmed
20
8
234
But if I use rank() and order by "Total Points", I get:
Place
License #
Rider Name
Total Points
Race Points
Race ID
1
123
Joe
25
5
567
1
123
Joe
25
12
234
1
123
Joe
25
8
987
4
456
Ahmed
20
12
567
4
456
Ahmed
20
8
234
Which makes sense, since there are three "ties" at 25 points.
dense_rank() solves this problem, but if there are legitimate ties across different racers, I want there to be gaps in the rank (e.g if Joe and Ahmed both had 25 points, the next racer would be in third place, not second).
The easiest way to solve this I think would be to issue two queries, one with the "duplicate" racers eliminated, and then a second one where I can retain the individual race data, which I need for the points break down display.
I can also probably, given enough effort, think of a way to do this in a single query, but I'm wondering if I'm not just missing something really obvious that could accomplish this in a single, relatively simple query.
Any suggestions?
You have to break this into steps to get what you want, but that can be done in a single query with common table expressions:
with riders as ( -- get individual riders
select distinct license, rider, total_points
from racists
), places as ( -- calculate non-dense rankings
select license, rider, rank() over (order by total_points desc) as place
from riders
)
select p.place, r.* -- join rankings into main table
from places p
join racists r on (r.license, r.rider) = (p.license, p.rider);
db<>fiddle here

Crosstab query to get results of three tables based on results table

This request might be asked many times but I have done a search last night to figure out but I came up with nothing.
I have three tables
Table A
ID
City
1
LA
2
NY
3
LV
Table B
ID
Job
11
Programmer
22
Engineer
33
Database Administrator
44
Cyber Security Analyst
Table C
ID
Job level
111
Junior
222
Associate
333
Senior
444
Director
Final table
ID
EmployeeName
City_ID
Job_ID
Level_ID
1000
Susie
1
11
333
1001
Nora
2
11
222
1002
Jackie
2
22
111
1003
Mackey
1
11
444
1004
Noah
1
11
111
I’d like to have a crosstab query using Microsoft Access that returns the following result ( based on city )
LA Table
Jobs
Junior
Associate
Senior
Director
Programmer
1
-
1
1
Engineer
-
-
-
-
Database Administrator
-
-
-
-
Cyber Security Analyst
-
-
-
-
How can I do it?
The best approach for this is always:
Create a "base" query that joins the base tables and returns all data columns that you will need for the crosstab query.
Run the crosstab query wizard using the "base" query as input.

How to handle null columns in a Relational Database Design

While mostly working with non-relational databases I need to switch gears and use a relational database as the application that I need to build will run complex queries and the join operation between tables is needed.
Before starting to create the database itself I've had to think about the architecture and I've set up an UML for Database Design:
This is how the TransactionDEpositBreakdown table may look:
id amount date reference_number batch_id payment_processor_id mid_id main_dep_id
1 100 2020-10-11 900 null 1 100 2
2 101 2020-10-11 900 null 1 100 2
3 102 2020-10-11 900 null 1 100 1
4 103 2020-10-11 350 null 1 100 1
5 104 2020-10-11 350 null 1 100 3
6 105 2020-10-11 600 null 1 100 4
7 106 2020-10-11 null 1000 2 201 null
8 107 2020-10-11 null 1001 2 201 null
9 108 2020-10-11 null 1002 2 201 null
10 109 2020-10-11 null 1003 2 201 null
A reference_number can be assigned to multiple transaction deposit breakdowns
A batch_id is assigned to only one transaction deposit breakdown
There is a use case where a TransactionDepositBreakdown may have a reference number or a batch id, depending on the payment processor type (type 1 - reference number, type 2 - batch id). I'm not sure how to handle this case, but I'm thinking about the following options:
Add two tables TransactionDepositBatch and TransactionDepositReference which will have the transaction_deposit_id as a foreign key, batch_id on the first table and reference_number on the latter one:
Keep the reference_number and batch_id columns in the TransactionDepositBreakdown table and have at all times one of them null depending on the payment processor type.
Note: There might be a need of adding another column to the TransactionDepositBreakdown table, such as card_type, which will have a value assigned only when the payment processor type is 1.
Is the first option the correct way to handle this, by also taking into consideration the above note?
Also, any recommendations regarding the UML that I've built would be really useful.
These one-of relationships are difficult to model in relational databases. Different databases have different capabilities, so some may have extensions that can be applied to this problem (such as Postgres's support of table inheritance).
Your situation is rather simple, given just two options. Under those circumstances, I would go for the first option for one simple reason: it easily allows you to design the data model with declared foreign key relationships. The downside is that the you need space for both foreign keys, even if one of them is going to be NULL.
You can also enforce that one or the other is set, but not both using a check constraint:
constraint chk_TransactionDepositBreakdown_reference_or_batch
check (reference_number is null or batch_id is null);

Best way to store multiple dropdown list values in database?

I'm working with a companies CRM database which contains clients that the company works with. Employees can create new client entries through the CRM and when someone creates a new entry and checks multiple services that the client provides, the service codes are all stored in 1 field. Example below:
clientid CompanyName Email Tel Services
1 Randomname1 XXX#... 33333 ;14;294;448
2 Randomname2 yyy#... 44444 ;448
3 Randomname3 zzz#... 55555 ;58;448;14;65;24
So my questions is, what is the best way of storing the services column values? There are around 60 different services that you can choose from.
The classic approach is:
Clients Table:
clientid CompanyName Email Tel
1 Randomname1 XXX#... 33333
2 Randomname2 yyy#... 44444
3 Randomname3 zzz#... 55555
Client_Services Table:
client_id service_id
1 14
1 294
1 448
2 448
3 58
3 448
3 14
3 65
3 24

MS Access, Excel, SQL, and New Tables

I'm just starting out with MS Access 2010 and have the following setup. 3 excel files: masterlist.x (which contains every product that I sell), vender1.x (which contains all products from vender1, I only sell some of these products), and vender2.x (again, contains all products from vender2, I only sell some of these products). Here's an example data collection:
masterlist.x
ID NAME PRICE
23 bananas .50
33 apples .75
35 nuts .87
38 raisins .25
vender1.x
ID NAME PRICE
23 bananas .50
25 pears .88
vender2.x
ID NAME PRICE
33 apples .75
35 nuts .87
38 raisins .25
49 kiwis .88
The vender lists get periodically updated with new items for sell and new prices. For example, vender1 raises the price on bananas to $.75, my masterlist.x would need to be updated to reflect this.
Where I'm at now: I know how to import the 3 excel charts into Access. From there, I've been researching if I need to setup relationships, create a macro, or a SQL query to accomplish my goals. Not necessarily looking for a solution, but to be pointed in the right direction would be great!
Also, once the masterlist.x table is updated, what feature would I use to see which line items were affected?
Update: discovered SQL /JOIN/ and have the following:
SELECT * FROM master
LEFT JOIN vender1
ON master.ID = vender1.ID
where master.PRICE <> vender1.PRICE;
This gives me the output (for the above scenario)
ID NAME PRICE ID NAME PRICE
23 bananas .50 23 bananas .75
What feature would instead give me:
masterlist.x
ID NAME PRICE
23 bananas .75
33 apples .75
35 nuts .87
38 raisins .25
Here is a heads up since you were asking for ideas to design. I don't really fancy your current table schema. The following queries are built in SQL Server 2008, the nearest syntax that I could get in sqlfiddle to MS Access SQL.
Please take a look:
SQLFIDDLE DEMO
Proposed table design:
vendor table:
VID VNAME
1 smp farms
2 coles
3 cold str
4 Anvil NSW
product table:
PID VID PNAME PPRICE
203 2 bananas 0.5
205 2 pears 0.88
301 3 bananas 0.78
303 3 apples 0.75
305 3 nuts 0.87
308 3 raisins 0.25
409 4 kiwis 0.88
masterlist:
ID PID MPRICE
1 203 0.5
2 303 0.75
3 305 0.87
4 308 0.25
Join queries can easily update your masterlist now. for e.g.:
When the vendor updates their prices for the fruits they provide you. Or when they stop supply on that product. You may use where clauses to add the conditions to the query as you desire.
Query:
SELECT m.id, p.vid, p.pname, p.pprice
FROM masterlist m
LEFT JOIN product p ON p.pid = m.pid
;
Results:
ID VID PNAME PPRICE
1 2 bananas 0.5
2 3 apples 0.75
3 3 nuts 0.87
4 3 raisins 0.25
Please comment. Happy to help you if have any doubts.