Create SQL query with dynamic WHERE statement - sql

I'm using Postgresql as my database, in case that's helpful, although I'd like to find a pure SQL approach instead of a Postgresql specific implementation.
I have a large set of test data obtained from manufacturing a piece of electronics and I'd like to take that set of data and extract from it which units met certain criteria during test, ideally using a separate table that contains the test criteria from each step of manufacturing.
As a simple example, let's say I check the temperature readback from the unit in two different steps of the test. In step 1, the temperature should be in the range of 20C-30C while step 2 should be in the range of 50C-60C.
Let's assume the following table structure with a set of example data (table name 'test_data'):
temperature step serial_number
25 1 1
55 2 1
19 1 2
20 2 2
and let's assume the following table that contains the above mentioned pass criteria (table name 'criteria'):
temperature_upper temperature_lower step
20 30 1
50 60 2
At the moment, using a static approach, I can just use the following query:
SELECT * FROM test_data WHERE
( test_data.step = 1 AND test_data.temperature > 20 AND test_data.temperature < 30 ) OR
( test_data.step = 2 AND test_data.temperature > 50 AND test_data.temperature < 60 );
which would effectively yield the following table:
temperature step serial_number
25 1 1
55 2 1
I'd like to make my select query more dynamic and instead of begin statically defined, make it construct itself off of a list of results from the test_criteria table. The hope is to grow this into a complex query where temperature, voltage and current might be checked in step 1 but only current in step 2, for example.
Thanks for any insight!

You can solve using a join between the tables
SELECT t.*
FROM test_data t
INNER JOIN criteria c ON t.step = c.step
AND t.temperature > c.temperature_upper
AND t.temperature < c.temperature_lower
OR if you want >= and <=
SELECT t.*
FROM test_data t
INNER JOIN criteria c ON t.step = c.step
AND t.temperature netween c.temperature_upper AND c.temperature_lower

Related

SQL select command SUM across 3 related tables

I've changed my DB structure to make it more future proof. Now I'm having trouble with the new select query.
I have table called activities that has a list of activities and how many steps per minute that activity was worth. The table was structred like this:
Activities
id act_name act_steps
12 Boxing 250
14 Karate 300
17 Yoga 89
I have another table called distance that is structed like this:
Distance
id dist_activity_id dist_activity_duration member_id
1 12 60 12
2 14 90 12
3 17 30 12
I have the query that would SUM and produce a total for all activities in the distance table
SELECT ROUND(SUM(act_steps * dist_activity_duration / 2000),2) AS total_miles
FROM distance,
activities
WHERE activities.id = distance.dist_activity_id
This worked fine.
To future proof it incase the number of steps for an activity changes I've setup a table called steps that is structured like this:
Steps
id activity_steps
1 6
2 250
3 300
4 89
I then updated the activities table, removing the act_steps column and replacing it with steps_id so it now looks like this:
Updated activities
id act_name steps_id
12 Boxing 2
14 Karate 3
17 Yoga 4
I'm not sure how to create the select command to get the SUM using the new structure.
Could someone please help me with this?
Thanks
Wayne
Learn to use proper JOIN syntax! Your query should look like:
SELECT ROUND(SUM(a.act_steps * d.dist_activity_duration / 2000), 2) AS total_miles
FROM distance d JOIN
activities a
ON a.id = d.dist_activity_id;
If you need to lookup the steps, then add another JOIN:
SELECT ROUND(SUM(s.activity_steps * d.dist_activity_duration / 2000), 2) AS total_miles
FROM distance d JOIN
activities a
ON a.id = d.dist_activity_id JOIN
steps s
ON s.id = a.steps_id;

Extract only variables which is greater than other table in influxDB

I am using influxDB and I would like to extract some values which is greater than certain threshold in other table.
For example, I have two tables as shown in below.
Table A
Time value
1 15
2 25
3 9
4 22
Table B
Time threshold
1 16
2 12
3 13
4 15
Give above two tables, I would like to extract three values which is greater than first row in Table B. Therefore what I want to have is as below.
Time value
2 25
4 22
I tried it using below sql query, but it didn't give any correct result.
select * from data1 where value > (select spec from spec1 limit1);
Look forward to your feedback.
Thanks.
Integrate the condition in an inner join:
select * from tableA as a
inner join tableB as b on a.id=b.id and a.value > b.threshold
When your time column doesn't only include integer values, you have to format the time and join on a time range. Here is an example:
SQL join on time range

Can I populate the 'in' variable using a list of values stored in one field

I hope I'm not trying to do something that is not possible. ..
In my DB, the query below works and gets the values I want.
select LabelID, Amount
from tCASpreadsData
where LabelID in (3,4,5,7,9,10,11,12,16,17,18,19,21,22,23,24,28,29,30)
However, I don't want to build the list of LabelIDs manually each time. I also don't have a way to logically select them. So, I created a table with all the values listed in one field.
The query below finds the list I want in a field called SumA.
select SumA from tlCECLRatio where CATemplateID = 1 and LabelID = 148
(3,4,5,7,9,10,11,12,16,17,18,19,21,22,23,24,28,29,30)
However, when I combine the two queries, I get nothing.
SELECT LabelID, Amount
FROM tCASpreadsData
WHERE convert(nvarchar(255),LabelID) in
(Select SumA from tlCECLRatio where CATemplateID = 1 and LabelID = 148)
How can I use the value of SumA to create the 'in' list in my where clause?
you can try like below no need conversion
SELECT LabelID, Amount FROM tCASpreadsData where LabelID in
(
Select SumA from tlCECLRatio where CATemplateID = 1 and LabelID = 148
)
This will work:
select LabelID, Amount
from tCASpreadsData
where LabelID in (
select SumA
from tlCECLRatio
where CATemplateID = 1 and LabelID = 148
)
From your OP, it seems like you have the freedom to define the interim table (tlCECLRatio) however you like. So, I would like to suggest that you define it without a varchar field and instead use all integer fields. Here's how it would look with the values you have provided:
CATemplateID LabelID
1 3
1 4
1 5
1 7
1 9
1 10
1 11
1 12
1 16
1 17
1 18
1 19
1 21
1 22
1 23
1 24
1 28
1 29
1 30
If you need other collections of labels, you would give them a new template ID. Each collection is therefore defined by the value of CATemplateID.
To query the values you want, it's a simple join.
select SD.LabelID, SD.Amount
from tCASpreadsData SD inner join tlCECLRatio CR
on SD.LabelID = CR.LabelID
where CR.CATemplateID = 1
Side Note: I have been taught that the interim table also needs its own row ID, so I would probably define it as CECLRatioValue(RatioValueID, CATemplateID, LabelID) where RatioValueID is a sequence (or autonumber) value. But, that might be overkill for a simple cross-reference table. Just pointing out what has been recommended to me as good database practice.

BigQuery full table to partition

I have a 340 GB of data in one table (270 days worth of data). Now planning move this data to partition table.
That means I will have 270 partitions. What is the best way to move this data to partition table.
I dont want to run 270 queries which is very costly operation. So looking for optimized solution.
I have multiple tables like this. I need to migrate all these tables to partition tables.
Thanks,
I see three options
Direct Extraction out of original table:
Actions (how many queries to run) = Days [to extract] = 270
Full Scans (how much data scanned measured in full scans of original table) = Days = 270
Cost, $ = $5 x Table Size, TB xFull Scans = $5 x 0.34 x 270 = $459.00
Hierarchical(recursive) Extraction: (described in Mosha’s answer)
Actions = 2^log2(Days) – 2 = 510
Full Scans = 2*log2(Days) = 18
Cost, $ = $5 x Table Size, TB xFull Scans = $5 x 0.34 x 18 = $30.60
Clustered Extraction: (I will describe it in a sec)
Actions = Days + 1 = 271
Full Scans = [always]2 = 2
Cost, $ = $5 x Table Size, TB xFull Scans = $5 x 0.34 x 2 = $3.40
Summary
Method Actions Total Full Scans Total Cost
Direct Extraction 270 270 $459.00
Hierarchical(recursive) Extraction 510 18 $30.60
Clustered Extraction 271 2 $3.40
Definitely, for most practical purposes Mosha’s solution is way to go (I use it in most such cases)
It is relatively simple and straightforward
Even though you need to run query 510 times – the query is "relatively" simple and orchestration logic is simple to implement with whatever client you usually use
And cost save is quite visible!
From $460 down to $31!
Almost 15 times down!
In case if you -
a) want to lower cost even further for yet another 9 times (so it will be total x135 times lower)
b) and like having fun and more challenges
- take a look at third option
“Clustered Extraction” Explanation
Idea / Goal:
Step 1
We want to transform original table into another [single] table with 270 columns – one column for one day
Each column will hold one serialized row for respective day from original table
Total number of rows in this new table will be equal to number of rows for most "heavy" day
This will require just one query (see example below) with one full scan
Step 2
After this new table is ready – we will be extracting day-by-day querying ONLY respective column and write into final daily table (schema of daily table are the very same as original table’s schema and all those tables could be pre-created)
This will require 270 queries to be run with scans approximately equivalent (this really depends on how complex your schema, so can vary) to one full size of original table
While querying column – we will need to de-serialize row’s value and parse it back to original scheme
Very simplified example: (using BigQuery Standard SQL here)
The purpose of this example is just to give direction if you will find idea interesting for you
Serialization / de-serialization is extremely simplified to keep focus on idea and less on particular implementation which can be different from case to case (mostly depends on schema)
So, assume original table (theTable) looks somehow like below
SELECT 1 AS id, "101" AS x, 1 AS ts UNION ALL
SELECT 2 AS id, "102" AS x, 1 AS ts UNION ALL
SELECT 3 AS id, "103" AS x, 1 AS ts UNION ALL
SELECT 4 AS id, "104" AS x, 1 AS ts UNION ALL
SELECT 5 AS id, "105" AS x, 1 AS ts UNION ALL
SELECT 6 AS id, "106" AS x, 2 AS ts UNION ALL
SELECT 7 AS id, "107" AS x, 2 AS ts UNION ALL
SELECT 8 AS id, "108" AS x, 2 AS ts UNION ALL
SELECT 9 AS id, "109" AS x, 2 AS ts UNION ALL
SELECT 10 AS id, "110" AS x, 3 AS ts UNION ALL
SELECT 11 AS id, "111" AS x, 3 AS ts UNION ALL
SELECT 12 AS id, "112" AS x, 3 AS ts UNION ALL
SELECT 13 AS id, "113" AS x, 3 AS ts UNION ALL
SELECT 14 AS id, "114" AS x, 3 AS ts UNION ALL
SELECT 15 AS id, "115" AS x, 3 AS ts UNION ALL
SELECT 16 AS id, "116" AS x, 3 AS ts UNION ALL
SELECT 17 AS id, "117" AS x, 3 AS ts UNION ALL
SELECT 18 AS id, "118" AS x, 3 AS ts UNION ALL
SELECT 19 AS id, "119" AS x, 4 AS ts UNION ALL
SELECT 20 AS id, "120" AS x, 4 AS ts
Step 1 – transform table and write result into tempTable
SELECT
num,
MAX(IF(ts=1, ser, NULL)) AS ts_1,
MAX(IF(ts=2, ser, NULL)) AS ts_2,
MAX(IF(ts=3, ser, NULL)) AS ts_3,
MAX(IF(ts=4, ser, NULL)) AS ts_4
FROM (
SELECT
ts,
CONCAT(CAST(id AS STRING), "|", x, "|", CAST(ts AS STRING)) AS ser,
ROW_NUMBER() OVER(PARTITION BY ts ORDER BY id) num
FROM theTable
)
GROUP BY num
tempTable will look like below:
num ts_1 ts_2 ts_3 ts_4
1 1|101|1 6|106|2 10|110|3 19|119|4
2 2|102|1 7|107|2 11|111|3 20|120|4
3 3|103|1 8|108|2 12|112|3 null
4 4|104|1 9|109|2 13|113|3 null
5 5|105|1 null 14|114|3 null
6 null null 15|115|3 null
7 null null 16|116|3 null
8 null null 17|117|3 null
9 null null 18|118|3 null
Here, I am using simple concatenation for serialization
Step 2 – extracting rows for specific day and write output to respective daily table
Please note: In below example - we extracting rows for ts = 2 : this corresponds to column ts_2
SELECT
r[OFFSET(0)] AS id,
r[OFFSET(1)] AS x,
r[OFFSET(2)] AS ts
FROM (
SELECT SPLIT(ts_2, "|") AS r
FROM tempTable
WHERE NOT ts_2 IS NULL
)
The result will look like below (which is expected):
id x ts
6 106 2
7 107 2
8 108 2
9 109 2
I wish I had more time for this to write down, so don’t judge to heavy if something missing – this is more directional answer - but at the same time example is pretty reasonable and if you have plain simple schema – almost no extra thinking is required. Of course with records, nested stuff in schema - most challenging part is serialization / de-serialization – but that’s where fun is – along with extra $saving
I will add another fourth option to #Mikhail's answer
DML QUERY
Action = 1 query to run
Full scans = 1
Cost = $5 x 0.34 = 1.7$ (x270 times cheaper than solution #1 \o/)
With the new DML feature of BiQuery you can convert a none partitioned table to a partitioned one while doing only one full scan of the source table
To illustrate my solution I will use one of BQ's public tables, namely bigquery-public-data:hacker_news.comments. below is the tables schema
name | type | description
_________________________________
id | INTGER | ...
_________________________________
by | STRING | ...
_________________________________
author | STRING | ...
_________________________________
... | |
_________________________________
time_ts | TIMESTAMP | human readable timestamp in UTC YYYY-MM-DD hh:mm:ss /!\ /!\ /!\
_________________________________
... | |
_________________________________
We are going to partition the comments table based on time_ts
#standardSQL
CREATE TABLE my_dataset.comments_partitioned
PARTITION BY DATE(time_ts)
AS
SELECT *
FROM `bigquery-public-data:hacker_news.comments`
I hope it helps :)
If your data was in sharded tables (i.e. with YYYYmmdd suffix), you could've used "bq partition" command. But with data in a single table - you will have to scan it multiple times applying different WHERE clauses on your partition key column.
The only optimization I can think of is to do it hierarchically, i.e. instead of 270 queries which will do 270 full table scans - first split table in half, then each half in half etc. This way you will need to pay for 2*log_2(270) = 2*9 = 18 full scans.
Once the conversion is done - all the temporary tables can be deleted to eliminate extra storage costs.

check if multiple values are within corresponding (multiple) ranges in 1 query

Going off of this question from a few years back: MS Access 2003: Check if data is within range from another table
What would I do if I want to check if multiple data is within multiple ranges in one query? I'm trying to find the points per test that a person took. I have different ranges for each test since one test may be out of 100, while another only out of 10.
Would I have to create many queries using the answer in the question linked above and combine or is there an easier way?
Table A:
Name Test1_Score Test2_Score Test3_Score
Person A 205 98 5
Person B 105 88 8
Person C 400 89 10
Table B:
Points Test1_GradeReq Test2_GradeReq Test3_GradeReq
1 0 0 0
2 300 30 1
3 300 70 2
4 400 100 3
The following gets the number of points for each test and each person:
select a.name, a.Test1_Score, a.Test2_Score, a.Test3_Score,
sum(iif(test1_score <= test1_gradeReq, 1, 0)) - 1 as Test1_Points,
sum(iif(test2_score <= test2_gradeReq, 1, 0)) - 1 as Test2_Points,
sum(iif(test3_score <= test3_gradeReq, 1, 0)) - 1 as Test3_Points
from TableA a, -- cross join
TableB b
group by a.name, a.Test1_Score, a.Test2_Score, a.Test3_Score
It assumes that the points increment by one for each step.