SQL : how to perform calculations with unrelated tables? - sql

I have very basic understanding of SQL.
And I have tables in a confluence page with various data. I would like to create a template my team can use where they have to fill in those tables with some values and then some automatic calculation is performed with the SQL query using the table transformer module.
The structure would be something similar to :
workload table
+-------+-------+
| task | hours |
+-------+-------+
| task1 | 2 |
| task2 | 5 |
+-------+-------+
ratetable (yes usually it will be one person)
+--------+------+
| person | rate |
+--------+------+
| John | 500 |
+--------+------+
project table (yes usually it will be a single project)
+----------+----------+
| project | duration |
+----------+----------+
| project1 | 15 |
+----------+----------+
And what I would like to achieve is to be able to compute the sum of the tasks multiplied by the rate multiplied by the duration. Don't mind the units, I have put a simple example but on my side, units work.
Of course I can use constants and this would be way easier. But the way confluence works, I can't do it easily because I can't fetch constants from the template in the SQL query (or at least I did not succeed achieving that) and putting constant in the SQL code is not very user friendly for my colleagues and they would for sure forget to change the numbers from time to time.
Basically, if I could easily reduce a table in SQL to a constant I can use everywhere, this would achieve what I want I think but I don't know if this is feasible in SQL easily. Apparently, in confluence, I cannot use the CREATTE FUNCTION capabilities.

Related

Efficiently making rows based on pairs of columns that don't apper in another table

So I'm trying to model a basic recommended friend system based on user activity. In this model, people can join activities, and if two people aren't already friends and happen to join the same activity, thier recommendation score for eachother increases.
Most of my app uses Firebase, but for this system I'm trying to use BigQuery.
The current system I have in mind:
I would have this table to represnet friendships. Since its an undirected graph, A->B being in the table infers that B->A will also be in the table.
+-------+-------+--------------+
| User1 | User2 | TimeFriended |
+-------+-------+--------------+
| abc | def | 12345 |
| def | abc | 12345 |
| abc | rft | 3456 |
| ... | ... | ... |
+-------+-------+--------------+
I also plan for activity participation to be stored like so:
+------------+-----------+---------------+------------+
| ActivityId | CreatorID | ParticipantID | TimeJoined |
+------------+-----------+---------------+------------+
| abc | def | eft | 21234 |
| ... | ... | ... | ... |
+------------+---------- +---------------+------------+
Lastly, assume maybe there's a table that stores mutual activities for these recommended friends (not super important, but assume it looks like:)
+-------+-------+------------+
| User1 | User2 | ActivityID |
+-------+-------+------------+
| abc | def | eft |
| ... | ... | ... |
+-------+-------+------------+
So here's the query I want to run:
Get all the participants for a particular activity.
For each of these participants, get all the other participants that aren't their friend
Add that tuple of {participant, other non-friend participant} to the "mutual activites" table
So there are oviously a couple of ways to do this. I could make a simple BigQuery script with looping, but I'm not a fan of that because it'll result in a lot of scans and since BigQuery doesn't use indexes it won't scale well (in terms of cost).
I could also maybe use something like a subquery with NOT EXISTS, like something like SELECT ParticipantID from activities WHERE activityID = x AND NOT EXISTS {something to show that there doesn't exist a friend relation}, but then its unclear how to make this work for every participant at one go. I'd be finee if I can come to a solution who's table scans scale linearly with the number of participants, but I have the premonition that even if I somehow get this to work, every NOT EXISTS will result in a full scan per participant pair, resulting in quadratic scaling.
There might be something I can do with joining, but I'm not sure.
Would love some thoughts and guidance on this. I'm not very used to SQL, especially complex queries like this.
PS: If any of y'all would like to suggest another serverless solution rather than BigQuery, go ahead please :)

Avoid storing permutations of all possible product configurations

I'll simplify my problem using a shirt analogy. I have the following tables:
shirt sizes (e.g. small, medium, large...)
shirt colors (e.g. red, green, blue...)
shirt styles (e.g. short sleeve, long sleeve, collared...)
Now, I'd like to create prices for my inventory. Not all shirts are available in every configuration, but some are. For example:
All shirt styles and sizes are available in green. These are $1.
Only large, collared shirts are available in blue. These are $2.
Short sleeve red shirts in all sizes are $3, but long sleeve and collared red shirts are $4.
I could create another table with all available combinations of the three tables and store the prices. This seems inefficient and prone to error. How else can I store these relationships?
Background
Its important to know the terminology and the lexicon when you are researching how to accomplish something. What you are looking to do here is basically design a product configurator or configure price quote (CPQ) system. These systems exists as proprietary and open source customizable off the shelf solutions. As a software architect for a mid market B2B company I am quite familiar with software that implements cpq from scratch and also software that integrates with COTs solutions. If this is anything but an academic exercise I would highly suggest you look at the myriad of free OSS CPQ tools. However since this is stack overflow I will address your question on a more technical level.
Four abstract layers
There are essentially four abstract layers to designing a product configuration system (which we will call a product configuration model).
Components and subcomponents
Attributes shared between those components
Tables / Relational Constraints connecting the components and sub components with their shared attributes
Expressions and expression constraints (which are non reusable statements that are conceptually the bottom layer)
Components
Let's take something simple like skateboards as a use case here. You may have a components table similar to the following
|---------------------|------------------|
| id | Name |
|---------------------|------------------|
| 1 | Decks |
|---------------------|------------------|
| 2 | Wheels |
|---------------------|------------------|
| 3 | Trucks |
|---------------------|------------------|
Sub Components
You may then have a sub components table similar to the following
|---------------------|------------------|------------------|
| id | Name | component_id |
|---------------------|------------------|------------------|
| 1 | Bearings | 2 |
|---------------------|------------------|------------------|
| 2 | Bushing | 3 |
|---------------------|------------------|------------------|
| 3 | Grip Tape | 1 |
|---------------------|------------------|------------------|
| 4 | Nuts / Bolts | 1 |
|---------------------|------------------|------------------|
As you can you in this simple example you have one to one and one to many relationship between components and sub components. It is important that you do not confuse this with attributes, which we have not addressed yet.
Attributes
Your next layer of abstraction is attributes. Generally, all your attributes are associated with table constraints to components and sub components, *and they are not limited to whether that particular combination exists or not).
For a simplied example you might have a table attributes with the following rows
|---------------------|------------------|------------------|
| id | Category | Value |
|---------------------|------------------|------------------|
| 1 | Size | 7.5 |
|---------------------|------------------|------------------|
| 2 | Size | 7.75 |
|---------------------|------------------|------------------|
| 3 | Size | 6.25 |
|---------------------|------------------|------------------|
| 4 | Brand | Toy Machine |
|---------------------|------------------|------------------|
| 5 | Brand | Bird House |
|---------------------|------------------|------------------|
| 5 | Brand | Nike |
|---------------------|------------------|------------------|
| 5 | Model | Nyjah Pro |
|---------------------|------------------|------------------|
| 5 | Model | Vice Monster |
|---------------------|------------------|------------------|
| 6 | ABEC Rating | class 6 |
|---------------------|------------------|------------------|
| 7 | ABEC Rating | class 3 |
|---------------------|------------------|------------------|
As you can see this table is not constrained in the same way your product major and product minor is (however, this is an over simplification and you'd obviously be using business keys in place of attribute labels like ABEC Rating, etc. It lists all attributes.
Expressions
Finally, you would have a table for expressions. These expressions would be stored as rows in the table. They may be relational with other expressions (recursive keys), but should not be relational with your tables. Rather, they should use a mixture of boolean logic, predefined functions, and the surrogate keys from your previous tables to specify the actual configurations available. These are generally NOT reusable (but can be combined with recursive keys for a bit more re-usability).
There are a variety of expression languages out there, some proprietary some open. I manage a custom built product configuration model that uses DMN (from the people who brought you BPMN) to express my statements.
Additionally, I have seem people use XML, XSLT, and XPath in place of the relational model listed above. An expression row might look something like the following
(/component/id#1 & (/attribute/#id == 6 | /attribute/#id == 7))
In Conclusion
Like any software system, abstraction is key. I have seen almost all CPQ and product configuration models boil down into these 4 abstractions (with hundreds of other abstractions in between). Unless this is an academic exercise I highly suggest you find a COTs solution. Knowing your products enough to abstract between major, minor, and attributes is key but the bread and butter (and unfortunately the "least clean" part is definitely the expression language you store in your tables).
Storing all the combinations isn't such a bad idea. But, you could also use wildcards. Your conditions would look like:
style size color price
NULL NULL green $1
collared large blue $2
short sleeve NULL red $3
long sleeve, collared NULL red $4
If you have only a handful of different prices, then this is probably okay. However, querying such a table would be less efficient than expanding it out for every combination.

converting table with many columns to many tables with two columns

Is it possible to convert table with many columns to many tables of two columns without losing data?
I will show what I mean:
Let say I have a table
+------------+----------+-------------+
|country code| site | advertiser |
+------------+----------+-------------|
| US | facebook | Cola |
| US | yahoo | Pepsi |
| FR | facebook | BMW |
| FR | yahoo | BMW |
+------------+----------+-------------+
The number of rows = [(number of countries) X (number of sites)] and the advertiser column is a variable that gets a value from a list with a limited number of advertisers
Is it possible to transform the 3 columns table to several tables with 2 columns without losing data?
If create two tables likes this I will surly lose data:
+------------+------------+
|country code| advertiser |
+------------+------------+
| US | Cola,Pepsi |
|-------------------------|
| FR | BMW |
+-------------------------+
+------------+------------+
| site | advertiser |
+------------+------------+
| facebook | Cola,BMW |
|-------------------------|
| yahoo | Pepsi,BMW |
+-------------------------+
But is I add a third "connection" table this will it help keep all the data and have the ability to recreate the original table?
+--------------+--------------------+
| country code | site |
+--------------+--------------------+
| US | facebook,yahoo |
|-----------------------------------|
| FR | facebook,yahoo |
+-----------------------------------+
Whether the table you specify can be 'converted' into into multiple tables is determined by whether the table is in fifth normal form i.e. if and only if every non-trivial join dependency in it is implied by the candidate keys.
If the table is in fifth normal form then it cannot be converted into multiple tables. If the table is not in fifth normal form then it is in one of the four lower normal forms and can be further normalized into fifth normal form by 'converting' it into multiple tables.
A table's normal form is determined by the column dependencies. These are determined by the meaning of the table i.e. what this table represents in the real world. You have not stated what the meaning of this table is and so whether this particular table can be converted into multiple tables is unknown.
You need to understand the process of normalization and using this you should be able to determine if it is possible to convert table with many columns to many tables of two columns without losing data? based on the column dependencies in the table.
You may be looking for Entity-Attribute-Value. Certainly it is much better than your proposal for keeping field values organized and not requiring a search of the field to determine if a value is present.

Database design: I want a column value to determine which table to query

I don't have much experience in designing databases. I want a column value to determine which table to query, and I don't know if there is a better method for this. Here is the concrete problem for better understanding:
I am designing a database for a survey creator application. I want to store different kind of questions (for example: multiple choice questions and basic text question). I have the following tables:
QUESTION
| ID | Title | TypeID |
----------------------------------------------
| 1 | "Pick a num from 1-10" | 1 |
| 2 | "Choose some from the list:" | 2 |
TYPE
| ID | Name | ExtraValues |
--------------------------------------------
|1 |Scale Question |ScaleValues |
|2 |Multiple Choice |MultiValues |
SCALE VALUES
|Question_ID | Min | Max |
--------------------------
|1 | 1 |10 |
MULTI VALUES
|Question_ID | Name | Value |
--------------------------------
|2 | Sugar | 10 |
|2 | Milk | 20 |
|2 | Egg | 14 |
So from now on, if a question is a "Multiple choice" type, than I want to check the table MULTI VALUES, else the SCALE VALUES. I can do it with stored procedure or I can just query the all the SOMETHING VALUES tables for the question_ID. But is there a better way to do it?
You can certainly design your database that way. However you can't grab the "ExtraValues" column in a query and have that automagically pull in that table into a query. Not without dynamically executed sql. You're best bet is just use branching logic on the question type and use that to determine where to get other related data.
You could also move the min and max fields into the QUESTION table and do away with the ScaleValues table completely. You could just set the to NULL if it's a multiple choice question.
I think there is definetely a better way to do it. Set up a many to many relationship between questions and available answers. Add a third column, named points. So your three tables would be:
Question - QuestionId and Text
Answer - AnswerId and Text
QuestionAnswer - QuestionId, AnswerId, and Points.
Award 0 points for wrong answers.
This design might be too simple. You might need a Test Table as well. Then you would need a TestId field in that many to many table, which would now be called, TestQuestionAnswer.

How can I best extract transitions in a transactional table?

Hypothetical example:
I have an SQL table that contains a billion or so transactions:
| Cost | DateTime |
| 1.00 | 2009-01-02 |
| 2.00 | 2009-01-03 |
| 2.00 | 2009-01-04 |
| 3.00 | 2009-01-05 |
| 1.00 | 2009-01-06 |
...
What I want is to pair down the data so that I only see the cost transitions:
| Cost | DateTime |
| 1.00 | 2009-01-02 |
| 2.00 | 2009-01-03 |
| 3.00 | 2009-01-05 |
| 1.00 | 2009-01-06 |
...
The simplest (and slowest) way to do this is to iterate over the entire table, tracking the changes. Is there a faster/better way to do this in SQL?
No. There is no faster way. You could write a query that does the same job but it will be much slower. You (as a developer) know that you need to compare a value only with its direct previous value, and there is no way to specify this with SQL. So you can do optimizations that SQL cannot.
So I imagine the fastest is to write a program that streams the results from the disk, holding in RAM only the last valid value and the current one (filtering out every value that is equal to the last valid).
This is a classic example of trying to use a sledge hammer when a hammer is needed. You want to extract some crazy reporting data out of a table but to do so is going to KILL your SQL Server. What you need to do to track changes is to create a tracking table specifically for this purpose. Then use a trigger that records a change in value in a product into this table. So on my products table, when I change the price it goes into the price tracking table.
If you are using this to track stock prices or something similar then again you use the same approach except you do a comparison of the price table and if a change occurs you save it. So the comparison only happens with new data, all the old comparisons are still housed in one location so you don't need to rerun the query which is going to kill your SQL Server's performance.