I have two tables:
Master_Equipment_Index (alias mei) containing the columns serial_num & model_num
Customer Equipment Index (alias cei) containing the columns account_num, serial_num, & model_num
Originally, guard rails were not implemented to require model attribute input in the mei data whenever new serial_num records were inserted. Whenever that serial_num is later associated with a customer account in the cei data, the model data carries over as null.
What I want to do is backfill the missing model attributes in the cei data from the mei data based on the strongest sequential character match from other similar serial_nums in the mei data.
To further clarify, I don't have access to mass update the mei or cei datasets. I can formalize change requests, but I need to build the function out to prove its worth. So this has to be done outside of any mass action query updates.
cei.account_num
cei.serial_num
cei.model
mei.serial_num
mei.model
serial_num_str_match
row_number
123123123
B4I4SXT1708
null
B4I4SXT178A
Model_Series1
8
1
123123123
B4I4SXT1708
null
B4I4SXTAS34
Model_Series2
7
2
In the table example above row_number 1 has a higher consecutive string match count than row_number 2. I want to only return row_number 1 and populate cei.model with mei.model's value.
cei.account_num
cei.serial_num
cei.model
mei.serial_num
mei.model
serial_num_str_match
row_number
123123123
B4I4SXT1708
Model_Series1
B4I4SXT178A
Model_Series1
8
1
To give an idea as to scale:
The mei data contains 1 million records and the cei data contains 50,000 records. I would have to take and perform this string match for every single cei.account_num, cei.serial_num where the cei.model data is null.
With mac addresses, the first 6 characters identify the vendor and I could look at things similarly in the sample SQL below to help reduce the volume of transactional 1:Many lookups taking place:
/* need to define function */
create temp function string_match_function(x any type, y any type) as (
syntax to generate consecutive string count matches between x and y
);
select * from (
select
c.account_num,
c.serial_num,
m.model,
row_number() over(partition by c.account_num, c.serial_num order by serial_num_str_match desc) seq
from (
select
c.account_num,
c.serial_num,
m.model,
needed: string_match_function(c.serial_num, m.serial_num) as serial_num_str_match
from (
select * from cei where model is null
) c
join (
select * from mei where model is not null
) m on substr(c.serial_num,1,6) = substr(m.serial_num,1,6)
) as a
) as b
where seq = 1
I've looked at different options, some coming from https://hoffa.medium.com/new-in-bigquery-persistent-udfs-c9ea4100fd83, but I'm not finding what I need.
Any insight or direction would be greatly appreciated.
This UDF function counts the equal charachters in each string from the begin:
CREATE TEMP FUNCTION string_match_function(x string, y string)
RETURNS int64
LANGUAGE js
AS r"""
var i=0;
var max_len= Math.min(x.length,y.length);
for(i=0;i<max_len;i++){
if(x[i]!=y[i]) {return i;}
}
return i;
""";
select string_match_function("12a345","1234")
gives 2, because both start with 12
I have two tables in SQL. I need to add rows from one table to another. The table to which I add rows looks like:
timestamp, deviceID, value
2020-10-04, 1, 0
2020-10-04, 2, 0
2020-10-07, 1, 1
2020-10-08, 2, 1
But I have to add a row to this table if a state for a particular deviceID was changed in comparison to the last timestamp.
For example this record "2020-10-09, 2, 1" won't be added because the value wasn't changed for deviceID = 2 and last timestamp = "2020-10-08". In the same time record "2020-10-09, 1, 0" will be added, because the value for deviceID = 1 was changed to 0.
I have a problem with writing a query for this logic. I have written something like this:
insert into output
select *
from values
where value != (
select value
from output
where timestamp = (select max(timestamp) from output) and output.deviceID = values.deviceID)
Of course it doesn't work because of the last part of the query "and output.deviceID = values.deviceID".
Actually the problem is that I don't know how to take the value from "output" table where deviceID is the same as in the row that I try to insert.
I would use order by and something to limit to one row:
insert into output
select *
from values
where value <> (select o2.value
from output o2
where o2.deviceId = v.deviceId
order by o2.timestamp desc
fetch first 1 row only
);
The above is standard SQL. Specific databases may have other ways to express this, such as limit or top (1).
This is a simple question.
Background
I'm supposed to have max of 400 rows in some table, based on timestamp field, so old ones will be removed automatically. For here, let's say it's 3 instead.
The table has various fields, but the timestamp is what's important here.
The problem
Even though I've succeeded (looked here), for some reason it got me to a max of an additional item, so I just adjusted it accordingly. This means that instead of 3, I got 4 items.
private const val MAX_ITEMS = 3
private val TIMESTAMP_FIELD = "timestamp"
private val DELETE_FROM_CALL_LOG_TILL_TRIGGER =
String.format(
"CREATE TRIGGER %1\$s INSERT ON %2\$s
WHEN (select count(*) from %2\$s)>%3\$s
BEGIN
DELETE FROM %2\$s WHERE %2\$s._id IN " +
"(SELECT %2\$s._id FROM %2\$s ORDER BY %2\$s.$TIMESTAMP_FIELD DESC LIMIT %3\$d, -1);
END;"
, "delete_till_reached_max", TABLE_NAME, MAX_ITEMS - 1)
What I've tried
I tried :
Change the condition to just being insertion (meaning without the WHEN part)
Change LIMIT %3\$d, -1 to LIMIT -1 OFFSET %3\$d . Also tried a different number than "-1" (tried 0, because I thought it's extra).
The questions
How come I had to use MAX_ITEMS - 1 instead of just MAX_ITEMS ? Why does it leave me with 4items instead of 3 ?
Does it matter if I have WHEN there? Is it better?
You have omitted the BEFORE | AFTER clause, so it's BEFORE by default. This means you are counting the rows before the insert, not after it.
This depends. At first, when the table has not reached the limit yet, the quick count lookup may save you some time, as you avoid the more complicated delete. But as soon as the table is full, you'll have to delete anyway, so counting is just additional work to do.
This should work:
private const val MAX_ITEMS = 3
private val TIMESTAMP_FIELD = "timestamp"
private val DELETE_FROM_CALL_LOG_TILL_TRIGGER =
String.format(
"CREATE TRIGGER %1\$s AFTER INSERT ON %2\$s
FOR EACH ROW
BEGIN
DELETE FROM %2\$s WHERE _id =
(SELECT _id FROM %2\$s ORDER BY %4\$s DESC LIMIT 1 OFFSET %3\$s);
END;"
, "delete_till_reached_max", TABLE_NAME, MAX_ITEMS, TIMESTAMP_FIELD)
Once there are 400 rows in the table, you can just as well call the trigger something like trg_keep_rowcount_constant and remove GROUP BY null HAVING COUNT(*) > %3\$s from the code.
Demo: https://dbfiddle.uk/?rdbms=sqlite_3.27&fiddle=ea3867e20e85927a2de047908771f4f1
I got a problem regarding grouping if a value is the same as in the row above.
Our statement looks like this:
SELECT pat_id,
treatData.treatmentdate AS Date,
treatMeth.name AS TreatDataTableInfo,
treatData.treatmentid AS TreatID
FROM dialysistreatmentdata treatData
LEFT JOIN hdtreatmentmethods treatMeth
ON treatMeth.id = treatData.hdtreatmentmethodid
WHERE treatData.hdtreatmentmethodid IS NOT NULL
AND Year(treatData.treatmentdate) >= 2013
AND ekeyid = 12
ORDER BY treatData.ekeyid,
treatmentdate DESC,
treatdatatableinfo;
The output looks like this:
The desired output should be grouped if the value is the same as in the row/rows before and ther should be a ToDate as you can see in the screenshot which is the date of the next row -1 day.
The desired output should look like this:
I hope someone has a solution regarding this matter!
Or maybe someone has an idea how to solve this problem within qlikview.
Looking forward for solutions
Michael
You want to collapse episodes of treatment into single rows. This is a "gaps-and-islands" problem. I like the difference of row numbers approach:
select patid, min(date) as fromdate, max(date) as todate, TreatDataTableInfo,
min(treatid)
from (select td.Pat_ID, td.TreatmentDate As Date, tm.Name As TreatDataTableInfo,
td.TreatmentID As TreatID,
row_number() over (partition by td.pat_id order by td.treatmentdate) as seqnum_p,
row_number() over (partition by td.pat_id, tm.name order by td.treatment_date) as seqnum_pn
from DialysisTreatmentData td Left join
HDTreatmentMethods tm
On tm.ID = td.HDTreatmentMethodID
where td.HDTreatmentMethodID Is Not Null And
td.TreatmentDate) >= '2013-01-01' and
EKeyID = 12
) t
group by patid, TreatDataTableInfo, (seqnum_p - seqnum_pn)
order by patid, TreatmentDate Desc, TreatDataTableInfo;
Note: This uses the ANSI standard window function row_number(), which is available in most databases.
Below is a possible Qlikview solution. I've put some comments in the script. If it's not clear just let me know. The result picture is below the script.
RawData:
Load * Inline [
Pat_ID,Date,TreatDataTableInfo,TreatId
PatNum_12,08.07.2016,HDF Pradilution,1
PatNum_12,07.07.2016,HDF Predilution,2
PatNum_12,23.03.2016,HD,3
PatNum_12,24.11.2015,HD,4
PatNum_12,22.11.2015,HD,5
PatNum_12,04.09.2015,HD,6
PatNum_12,01.09.2015,HD,7
PatNum_12,30.07.2015,HD,8
PatNum_12,12.01.2015,HD,9
PatNum_12,09.01.2015,HD,10
PatNum_12,26.08.2014,Hemodialysis,11
PatNum_12,08.07.2014,Hemodialysis,12
PatNum_12,23.05.2014,Hemodialysis,13
PatNum_12,19.03.2014,Hemodialysis,14
PatNum_12,29.01.2014,Hemodialysis,15
PatNum_12,14.12.2013,Hemodialysis,16
PatNum_12,26.10.2013,Hemodialysis,17
PatNum_12,05.10.2013,Hemodialysis,18
PatNum_12,03.10.2013,HD,19
PatNum_12,24.06.2013,Hemodialysis,20
PatNum_12,03.06.2013,Hemodialysis,21
PatNum_12,14.05.2013,Hemodialysis,22
PatNum_12,26.02.2013,HDF Postdilution,23
PatNum_12,23.02.2013,HDF Pradilution,24
PatNum_12,21.02.2013,HDF Postdilution,25
PatNum_12,07.02.2013,HD,26
PatNum_12,25.01.2013,HDF Pradilution,27
PatNum_12,18.01.2013,HDF Pradilution,28
];
GroupedData:
Load
*,
// assign new GroupId for all rows where the TreatDataTableInfo is equal
if( RowNo() = 1, 1,
if( TreatDataTableInfo <> peek('TreatDataTableInfo'),
peek('GroupId') + 1, peek('GroupId'))) as GroupId,
// assign new GroupSubId (incremental int) for all the records in each group
if( TreatDataTableInfo <> peek('TreatDataTableInfo'),
1, peek('GroupSubId') + 1) as GroupSubId,
// pick the first Date field value and spread it acccross the group
if( TreatDataTableInfo <> peek('TreatDataTableInfo'), TreatId, peek('TreatId_Temp')) as TreatId_Temp
Resident
RawData
;
Drop Table RawData;
right join (GroupedData)
// get the max GroupSubId for each group and right join it to
// the GroupedData table to remove the records we dont need
MaxByGroup:
Load
max(GroupSubId) as GroupSubId,
GroupId
Resident
GroupedData
Group By
GroupId
;
// these are not needed anymore
Drop Fields GroupId, GroupSubId, TreatId;
// replace the old TreatId with the new TreatId_Temp field
// which contains the first TreatId for each group
Rename Field TreatId_Temp to TreatId;
This should be very simple but I cannot figure out how to do it. I would like to modify the values of two different columns. One from 1 to the total number of rows and the other one from the total of rows to one (basically increasing and decreasing number). I tried:
start = 0
end = number_of_rows + 1
c.execute('SELECT * FROM tablename')
newresult=c.fetchall()
for row in newresult:
start += 1
end -= 1
t = (start,)
u = (end,)
c.execute("UPDATE tablename SET Z_PK = ?", t) ---> this will transform all rows with Z_PK since there is no where statement to limit
c.execute("UPDATE tablename SET Z_OPT = ?", u)
The thing is that I don't know how I can add the "where" statement since I have no values I am sure for rows (like IDs number). A possibility would be to return the current row as the argument for "where" but I don't know how to do it...
Without an INTEGER PRIMARY KEY column, your table has an internal rowid column, which already contains the values you want:
UPDATE MyTable
SET Z_PK = rowid,
Z_OPT = (SELECT COUNT(*) FROM MyTable) + 1 - rowid