Improving App Speed with a Second Serial column in PostgreSQL

Improving App Speed with a Second Serial column in PostgreSQL - sql

Edit: (Explanation added)
I have a table that stores some data, it has the structure indicated below.
id tparti_id orde desc
1 1 10 One thing
2 1 20 Another thing
3 1 30 Last task of the month
4 2 10 First of second month
5 2 20 Second and last of second month
6 3 10 First of third month
The orde field it's the the sequence of rows with the same tparti_id, this value it's used on our app to sort the contents given tparti_id, the user can reorder it changing the values in order.
The values stored came from a text file and are parsed by a CakePHP app.
When a new row it's inserted the next value in the sequence of orde is calculated by searching the current value in orde and adding 10 given an id, if no orde is found returns 10
public function nextOrden($tpid){
$sql = "select orde from tdpies where tparti_id =".$tpid." order by orde desc limit 1;";
$r = $this->query($sql,$cachequeries = false);
if ($r){
$res = $r[0][0]['orde'] + 10;
} else {
$res = 10;
}
return $res;
}
It's working ok when inserting few records, but when inserting thousands of records it's time consuming.
How can improve performance:
Using a trigger when a new record it's created?
Using a new sequence with some trickery inside?
Thanks

You seem to have an auto-incremented id so you could calculated orde when you query table rather tha when you create it:
select p.*, 10 * (row_number() over (partition by tparti_id order by id)) as orde
from tdpies p;
If you want to handle this on insert, then add an index on tdpies(tparti_id, orde). Inside the trigger, you'll have something like:
new.orde := (select coalesce(max(orde), 0) + 10
from tdpies p
where p.tparti_id = new.tparti_id
);

It can (and should) be done in SQL in instead of in PHP
$sql = "
select coalesce(max(orde), 0) + 10 as new_id
from tdpies
where tparti_id = $tpid
"
If max(orde) returns null when there is no matching tparti_id the coalesce function returns 0.
Notice that it is not necessary to concatenate $tpid as it will be evaluated inside a double quoted string.
It is better to do the insert in the same query
insert into tdpies (
id, tparti_id, orde, description
)
select
1, $tpid,
coalesce(max(orde), 0) + 10,
'something about months'
from tdpies
where tparti_id = $tpid
You need an unique constraint (tparti_id, orde) in order to keep the table integrity.

Related

BigQuery: Count consecutive string matches between two fields

I have two tables:
Master_Equipment_Index (alias mei) containing the columns serial_num & model_num
Customer Equipment Index (alias cei) containing the columns account_num, serial_num, & model_num
Originally, guard rails were not implemented to require model attribute input in the mei data whenever new serial_num records were inserted. Whenever that serial_num is later associated with a customer account in the cei data, the model data carries over as null.
What I want to do is backfill the missing model attributes in the cei data from the mei data based on the strongest sequential character match from other similar serial_nums in the mei data.
To further clarify, I don't have access to mass update the mei or cei datasets. I can formalize change requests, but I need to build the function out to prove its worth. So this has to be done outside of any mass action query updates.
cei.account_num
cei.serial_num
cei.model
mei.serial_num
mei.model
serial_num_str_match
row_number
123123123
B4I4SXT1708
null
B4I4SXT178A
Model_Series1
8
1
123123123
B4I4SXT1708
null
B4I4SXTAS34
Model_Series2
7
2
In the table example above row_number 1 has a higher consecutive string match count than row_number 2. I want to only return row_number 1 and populate cei.model with mei.model's value.
cei.account_num
cei.serial_num
cei.model
mei.serial_num
mei.model
serial_num_str_match
row_number
123123123
B4I4SXT1708
Model_Series1
B4I4SXT178A
Model_Series1
8
1
To give an idea as to scale:
The mei data contains 1 million records and the cei data contains 50,000 records. I would have to take and perform this string match for every single cei.account_num, cei.serial_num where the cei.model data is null.
With mac addresses, the first 6 characters identify the vendor and I could look at things similarly in the sample SQL below to help reduce the volume of transactional 1:Many lookups taking place:
/* need to define function */
create temp function string_match_function(x any type, y any type) as (
syntax to generate consecutive string count matches between x and y
);
select * from (
select
c.account_num,
c.serial_num,
m.model,
row_number() over(partition by c.account_num, c.serial_num order by serial_num_str_match desc) seq
from (
select
c.account_num,
c.serial_num,
m.model,
needed: string_match_function(c.serial_num, m.serial_num) as serial_num_str_match
from (
select * from cei where model is null
) c
join (
select * from mei where model is not null
) m on substr(c.serial_num,1,6) = substr(m.serial_num,1,6)
) as a
) as b
where seq = 1
I've looked at different options, some coming from https://hoffa.medium.com/new-in-bigquery-persistent-udfs-c9ea4100fd83, but I'm not finding what I need.
Any insight or direction would be greatly appreciated.

This UDF function counts the equal charachters in each string from the begin:
CREATE TEMP FUNCTION string_match_function(x string, y string)
RETURNS int64
LANGUAGE js
AS r"""
var i=0;
var max_len= Math.min(x.length,y.length);
for(i=0;i<max_len;i++){
if(x[i]!=y[i]) {return i;}
}
return i;
""";
select string_match_function("12a345","1234")
gives 2, because both start with 12

Complex INSERT INTO SELECT statement in SQL

I have two tables in SQL. I need to add rows from one table to another. The table to which I add rows looks like:
timestamp, deviceID, value
2020-10-04, 1, 0
2020-10-04, 2, 0
2020-10-07, 1, 1
2020-10-08, 2, 1
But I have to add a row to this table if a state for a particular deviceID was changed in comparison to the last timestamp.
For example this record "2020-10-09, 2, 1" won't be added because the value wasn't changed for deviceID = 2 and last timestamp = "2020-10-08". In the same time record "2020-10-09, 1, 0" will be added, because the value for deviceID = 1 was changed to 0.
I have a problem with writing a query for this logic. I have written something like this:
insert into output
select *
from values
where value != (
select value
from output
where timestamp = (select max(timestamp) from output) and output.deviceID = values.deviceID)
Of course it doesn't work because of the last part of the query "and output.deviceID = values.deviceID".
Actually the problem is that I don't know how to take the value from "output" table where deviceID is the same as in the row that I try to insert.

I would use order by and something to limit to one row:
insert into output
select *
from values
where value <> (select o2.value
from output o2
where o2.deviceId = v.deviceId
order by o2.timestamp desc
fetch first 1 row only
);
The above is standard SQL. Specific databases may have other ways to express this, such as limit or top (1).

How to set a max items for a specific table using Trigger, in SQLite?

This is a simple question.
Background
I'm supposed to have max of 400 rows in some table, based on timestamp field, so old ones will be removed automatically. For here, let's say it's 3 instead.
The table has various fields, but the timestamp is what's important here.
The problem
Even though I've succeeded (looked here), for some reason it got me to a max of an additional item, so I just adjusted it accordingly. This means that instead of 3, I got 4 items.
private const val MAX_ITEMS = 3
private val TIMESTAMP_FIELD = "timestamp"
private val DELETE_FROM_CALL_LOG_TILL_TRIGGER =
String.format(
"CREATE TRIGGER %1\$s INSERT ON %2\$s
WHEN (select count(*) from %2\$s)>%3\$s
BEGIN
DELETE FROM %2\$s WHERE %2\$s._id IN " +
"(SELECT %2\$s._id FROM %2\$s ORDER BY %2\$s.$TIMESTAMP_FIELD DESC LIMIT %3\$d, -1);
END;"
, "delete_till_reached_max", TABLE_NAME, MAX_ITEMS - 1)
What I've tried
I tried :
Change the condition to just being insertion (meaning without the WHEN part)
Change LIMIT %3\$d, -1 to LIMIT -1 OFFSET %3\$d . Also tried a different number than "-1" (tried 0, because I thought it's extra).
The questions
How come I had to use MAX_ITEMS - 1 instead of just MAX_ITEMS ? Why does it leave me with 4items instead of 3 ?
Does it matter if I have WHEN there? Is it better?

You have omitted the BEFORE | AFTER clause, so it's BEFORE by default. This means you are counting the rows before the insert, not after it.
This depends. At first, when the table has not reached the limit yet, the quick count lookup may save you some time, as you avoid the more complicated delete. But as soon as the table is full, you'll have to delete anyway, so counting is just additional work to do.
This should work:
private const val MAX_ITEMS = 3
private val TIMESTAMP_FIELD = "timestamp"
private val DELETE_FROM_CALL_LOG_TILL_TRIGGER =
String.format(
"CREATE TRIGGER %1\$s AFTER INSERT ON %2\$s
FOR EACH ROW
BEGIN
DELETE FROM %2\$s WHERE _id =
(SELECT _id FROM %2\$s ORDER BY %4\$s DESC LIMIT 1 OFFSET %3\$s);
END;"
, "delete_till_reached_max", TABLE_NAME, MAX_ITEMS, TIMESTAMP_FIELD)
Once there are 400 rows in the table, you can just as well call the trigger something like trg_keep_rowcount_constant and remove GROUP BY null HAVING COUNT(*) > %3\$s from the code.
Demo: https://dbfiddle.uk/?rdbms=sqlite_3.27&fiddle=ea3867e20e85927a2de047908771f4f1

Collapse Data in Sql without stored precedure or function if a value is the same as the value from row above

I got a problem regarding grouping if a value is the same as in the row above.
Our statement looks like this:
SELECT pat_id,
treatData.treatmentdate AS Date,
treatMeth.name AS TreatDataTableInfo,
treatData.treatmentid AS TreatID
FROM dialysistreatmentdata treatData
LEFT JOIN hdtreatmentmethods treatMeth
ON treatMeth.id = treatData.hdtreatmentmethodid
WHERE treatData.hdtreatmentmethodid IS NOT NULL
AND Year(treatData.treatmentdate) >= 2013
AND ekeyid = 12
ORDER BY treatData.ekeyid,
treatmentdate DESC,
treatdatatableinfo;
The output looks like this:
The desired output should be grouped if the value is the same as in the row/rows before and ther should be a ToDate as you can see in the screenshot which is the date of the next row -1 day.
The desired output should look like this:
I hope someone has a solution regarding this matter!
Or maybe someone has an idea how to solve this problem within qlikview.
Looking forward for solutions
Michael

You want to collapse episodes of treatment into single rows. This is a "gaps-and-islands" problem. I like the difference of row numbers approach:
select patid, min(date) as fromdate, max(date) as todate, TreatDataTableInfo,
min(treatid)
from (select td.Pat_ID, td.TreatmentDate As Date, tm.Name As TreatDataTableInfo,
td.TreatmentID As TreatID,
row_number() over (partition by td.pat_id order by td.treatmentdate) as seqnum_p,
row_number() over (partition by td.pat_id, tm.name order by td.treatment_date) as seqnum_pn
from DialysisTreatmentData td Left join
HDTreatmentMethods tm
On tm.ID = td.HDTreatmentMethodID
where td.HDTreatmentMethodID Is Not Null And
td.TreatmentDate) >= '2013-01-01' and
EKeyID = 12
) t
group by patid, TreatDataTableInfo, (seqnum_p - seqnum_pn)
order by patid, TreatmentDate Desc, TreatDataTableInfo;
Note: This uses the ANSI standard window function row_number(), which is available in most databases.

Below is a possible Qlikview solution. I've put some comments in the script. If it's not clear just let me know. The result picture is below the script.
RawData:
Load * Inline [
Pat_ID,Date,TreatDataTableInfo,TreatId
PatNum_12,08.07.2016,HDF Pradilution,1
PatNum_12,07.07.2016,HDF Predilution,2
PatNum_12,23.03.2016,HD,3
PatNum_12,24.11.2015,HD,4
PatNum_12,22.11.2015,HD,5
PatNum_12,04.09.2015,HD,6
PatNum_12,01.09.2015,HD,7
PatNum_12,30.07.2015,HD,8
PatNum_12,12.01.2015,HD,9
PatNum_12,09.01.2015,HD,10
PatNum_12,26.08.2014,Hemodialysis,11
PatNum_12,08.07.2014,Hemodialysis,12
PatNum_12,23.05.2014,Hemodialysis,13
PatNum_12,19.03.2014,Hemodialysis,14
PatNum_12,29.01.2014,Hemodialysis,15
PatNum_12,14.12.2013,Hemodialysis,16
PatNum_12,26.10.2013,Hemodialysis,17
PatNum_12,05.10.2013,Hemodialysis,18
PatNum_12,03.10.2013,HD,19
PatNum_12,24.06.2013,Hemodialysis,20
PatNum_12,03.06.2013,Hemodialysis,21
PatNum_12,14.05.2013,Hemodialysis,22
PatNum_12,26.02.2013,HDF Postdilution,23
PatNum_12,23.02.2013,HDF Pradilution,24
PatNum_12,21.02.2013,HDF Postdilution,25
PatNum_12,07.02.2013,HD,26
PatNum_12,25.01.2013,HDF Pradilution,27
PatNum_12,18.01.2013,HDF Pradilution,28
];
GroupedData:
Load
*,
// assign new GroupId for all rows where the TreatDataTableInfo is equal
if( RowNo() = 1, 1,
if( TreatDataTableInfo <> peek('TreatDataTableInfo'),
peek('GroupId') + 1, peek('GroupId'))) as GroupId,
// assign new GroupSubId (incremental int) for all the records in each group
if( TreatDataTableInfo <> peek('TreatDataTableInfo'),
1, peek('GroupSubId') + 1) as GroupSubId,
// pick the first Date field value and spread it acccross the group
if( TreatDataTableInfo <> peek('TreatDataTableInfo'), TreatId, peek('TreatId_Temp')) as TreatId_Temp
Resident
RawData
;
Drop Table RawData;
right join (GroupedData)
// get the max GroupSubId for each group and right join it to
// the GroupedData table to remove the records we dont need
MaxByGroup:
Load
max(GroupSubId) as GroupSubId,
GroupId
Resident
GroupedData
Group By
GroupId
;
// these are not needed anymore
Drop Fields GroupId, GroupSubId, TreatId;
// replace the old TreatId with the new TreatId_Temp field
// which contains the first TreatId for each group
Rename Field TreatId_Temp to TreatId;

update column with numbers in sqlite table

This should be very simple but I cannot figure out how to do it. I would like to modify the values of two different columns. One from 1 to the total number of rows and the other one from the total of rows to one (basically increasing and decreasing number). I tried:
start = 0
end = number_of_rows + 1
c.execute('SELECT * FROM tablename')
newresult=c.fetchall()
for row in newresult:
start += 1
end -= 1
t = (start,)
u = (end,)
c.execute("UPDATE tablename SET Z_PK = ?", t) ---> this will transform all rows with Z_PK since there is no where statement to limit
c.execute("UPDATE tablename SET Z_OPT = ?", u)
The thing is that I don't know how I can add the "where" statement since I have no values I am sure for rows (like IDs number). A possibility would be to return the current row as the argument for "where" but I don't know how to do it...

Without an INTEGER PRIMARY KEY column, your table has an internal rowid column, which already contains the values you want:
UPDATE MyTable
SET Z_PK = rowid,
Z_OPT = (SELECT COUNT(*) FROM MyTable) + 1 - rowid

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Improving App Speed with a Second Serial column in PostgreSQL - sql

Related

BigQuery: Count consecutive string matches between two fields

Complex INSERT INTO SELECT statement in SQL

How to set a max items for a specific table using Trigger, in SQLite?

Collapse Data in Sql without stored precedure or function if a value is the same as the value from row above

update column with numbers in sqlite table

Categories

Resources