how to have one itempointer serialize from 1 to n across the selected rows - sql

as shown in the example below, the output of the query contains blockid startds from 324 and it ends at 127, hence, the itempointer or the row index within the block starts from one for each new block id. in otherwords, as shown below
for the blockid 324 it has only itempointer with index 10
for the blockid 325 it has itempointers starts with 1 and ends with 9
i want to have a single blockid so that the itempointer or the row index starts from 1 and ends with 25
plese let me know how to achive that and
why i have three different blockids?
ex-1
query:
select ctid
from awanti_grid_cell_data agcd
where selectedsiteid = '202230060950'
and centerPointsOfWindowAsGeoJSONInEPSG4326ForCellsInTreatment IS NOT NULL
and centerPointsOfWindowAsGeoJSONInEPSG4326ForCellsInTreatment <> 'None'
result:
|ctid |
|--------|
|(324,10)|
|(325,1) |
|(325,2) |
|(325,3) |
|(325,4) |
|(325,5) |
|(325,6) |
|(325,7) |
|(325,8) |
|(325,9) |
|(326,1) |
|(326,2) |
|(326,3) |
|(326,4) |
|(326,5) |
|(326,6) |
|(326,7) |
|(326,8) |
|(326,9) |
|(327,1) |
|(327,2) |
|(327,3) |
|(327,4) |
|(327,5) |
|(327,6) |

You are missing the point. The ctid is the physical address of a row in the table, and it is none of your business. The database is free to choose whatever place it thinks fit for a table row. As a comparison, you cannot go to the authorities and request that your social security number should be 12345678 - it is simply assigned to you, and you have no say. That's how it is with the physical location of tuples.
Very likely you are not asking this question out of pure curiosity, but because you want to solve some problem. You should instead ask a question about your real problem, and there may be a good answer to that. But whatever problem you are trying to solve, using the ctid is probably not the correct answer, in particular if you want to control it.

Related

Find current data set using two SQL tables storing separately historical insertions and deletions

Problem
I need to do daily syncs of our latest internal data to an external audit database that does not offer an update interface. In order to update some records, I need to first generate and send in a deletion file to remove those records, and then follow by an insertion file with the same but updated records in it.
An important detail is that all of the records in deletion files must match the external records verbatim, in order to be deleted.
Proposed approach
Currently I use two separate SQL tables to version control what I have inserted/deleted.
Let's say that right now the inserted_records table looks like this:
id | file_version | contract_id | customer_name | start_year
9 | 6 | 1 | Alice | 2015
10 | 6 | 2 | Bob | 2015
11 | 6 | 3 | Charlie | 2015
Accompanied by a separate and empty deleted_records table with identical columns.
Now, if I want to
change the customer_name from Alice to Dave on line id 9
change the start_year for Bob from 2015 to 2020 on line id 10
Two new lines in inserted_records would be generated, line 12 and 13, in turn creating a new insertion file 7.
id | file_version | contract_id | customer_name | start_year
9 | 6 | 1 | Alice | 2015
10 | 6 | 2 | Bob | 2015
11 | 6 | 3 | Charlie | 2015
12 | 7 | 1 | Dave | 2015
13 | 7 | 2 | Bob | 2020
Then their original column values in line 9 and 10 are then copied onto the previously empty deleted_records, in turn creating a new deletion file 1.
id | file_version | contract_id | customer_name | start_year
1 | 1 | 1 | Alice | 2015
2 | 1 | 2 | Bob | 2015
Now, if I were to send in the deletion file 1 first followed by the insertion file 7, I would get the result that I wanted.
Question
How can I query the current set of records, considering all insertions and deletions that have occurred? Assuming all records in deleted_records always have matches in inserted_records and if multiple, we always delete records with smaller file version numbers first.
I have tried by first writing one to query the inserted_records for the latest records grouped by contract_id.
select top 1 with ties *
from insertion_record
order by row_number() over (partition by contract_id order by file_version desc)
This would give me line 11, 12 and 13, which is what I wanted in this particular example. But if we also wanted to delete the record line 11 with Charlie, then my query wouldn't work anymore as it doesn't take deleted_records into account, and I have no idea how to do it in SQL.
Furthermore, my nut tells me that this approach isn't solid as there are two separate and moving parts, perhaps there is a better approach to solve this?
How can I query the current set of records
I don't understand your question. Every SQL query is against the current set of records, if by that you mean the data currently in the database.
I do see a couple of problems.
Unless the table you're deleting from has a key defined, even an exact match on every column risks deleting more than one row.
You're performing an ad hoc update with UPDATE's transaction guarantee. I suppose the table you're updating is otherwise idle, and as a practical matter you don't have to worry about someone else (or you) re-inserting the deleted rows before your inserts arrive. But it's problem waiting to happen.
If what you're trying to do is produce the set of rows that will be the result of a series of inserts and deletions, you haven't provided enough information to say how that could be done, or even if it's possible. There would have to be some way to uniquely identify rows, so that deletions and insertions can be associated. (They don't match on all columns, after all.) And you'd need some indication of order of operation, because it matters whether INSERT follows or precedes DELETE.

ID Extracted from string not useable for connecting to bound form - "expression ... too complex"

I have a linked table to a Outlook Mailitem folder in my Access Database. This is handy in that it keeps itself constantly updated, but I can't add an extra field to relate these records to a parent table.
My workaround was to put an automatically generated/added ID String into the Subject so I could work from there. In order to make my form work the way I need it to, I'm trying to create a query that takes the fields I need from the linked table and adds a calculated field with the extracted ID so it can be referenced for relating records in the form.
The query works fine (I get all the records and their IDs extracted) but when I try to filter records from this query by the calculated field I get:
This expression is typed incorrectly, or it is too complex to be evaluated. For example, a numeric expression may contain too many complicated elements. Try simplifying the expression by assigning parts of the expression to variables.
I tried separating the calculated field out into three fields so it's easier to read, hoping that would make it easier to evaluate for Access, but I still get the same error. My base query is currently:
SELECT InStr(Subject,"Support Project #CS")+19 AS StartID,
InStr(StartID,Subject," ") AS EndID,
Int(Mid(Subject,StartID,EndID-StartID)) AS ID,
ProjectEmails.Subject,
ProjectEmails.[From],
ProjectEmails.To,
ProjectEmails.Received,
ProjectEmails.Contents
FROM ProjectEmails
WHERE (((ProjectEmails.[Subject]) Like "*Support Project [#]CS*"));
I've tried to bind a subform to this query on qryProjectEmailWithID.ID = SupportProject.ID where the main form is bound to SupportProject, and I get the above error. I tried building a query that selects all records from that query where the ID = a given parameter and I still get the same error.
The working query that adds Support Project IDs would look like:
+----+--------------------------------------+----------------------+----------------------+------------+----------------------------------+
| ID | Subject | To | From | Received | Contents |
+----+--------------------------------------+----------------------+----------------------+------------+----------------------------------+
| 1 | RE: Support Project #CS1 ID Extra... | questions#so.com | Isaac.Reefman#so.com | 2019-03-11 | Trying to work out how to add... |
| 1 | RE: Support Project #CS1 ID Extra... | isaac.reefman#so.com | questions#so.com | 2019-03-11 | Thanks for your question. The... |
| 1 | RE: Support Project #CS1 ID Extra... | isaac.reefman#so.com | questions#so.com | 2019-03-11 | You should use a different me... |
| 2 | RE: Support Project #CS2 IT issue... | support#domain.com | someone#company.com | 2019-02-21 | I really need some help with ... |
| 2 | RE: Support Project #CS2 IT issue... | someone#company.com | support#domain.com | 2019-02-21 | Thanks for your question. The... |
| 2 | RE: Support Project #CS2 IT issue... | someone#company.com | support#domain.com | 2019-02-21 | Have you tried turning it off... |
| 3 | RE: Support Project #CS3 email br... | support#domain.com | someone#company.com | 2019-02-12 | my email server is malfunccti... |
| 3 | RE: Support Project #CS3 email br... | someone#company.com | support#domain.com | 2019-02-12 | Thanks for your question. The... |
| 3 | RE: Support Project #CS3 email br... | someone#company.com | support#domain.com | 2019-02-13 | I've just re-started the nece... |
+----+--------------------------------------+----------------------+----------------------+------------+----------------------------------+
The view in question would populate a datasheet that looks the same with just the items whos ID matches the ID of the current SupportProject record, updating when a new record is selected. A separate text box should show the full content of whichever record is selected in that grid, like this:
Have you tried turning it off and on again?
From: support#domain.com
On: 21/02/2019
Thanks for your question. The matter has been assigned to Support Project #CS2, and a support staff member will be in touch shortly to help you out. As it is considered of medium priority, you should expect daily updates.
Thanks,
Support
From: someone#company
On: 21/02/2019
I really need some help with my computer. It seems really slow and I can't do my work efficiently.
Neither of these things happens as when I try to use the calculated number to relate to the PK of the SupportProject table...
I don't know if this is a part of the problem, but whether I use Int(Mid(Subject... or Val(Mid(Subject... I still apparently get a Double, where the ID field (as an autoincrement ID) is a Long. I can't work out how to force it to return a Long, so I can't test whether that's the problem.
So that is output resulting from posted SQL? I really wanted raw data but close enough. If requirement is to extract number after ...CS, calculate in query and save query:
Val(Mid([Subject],InStr([Subject],"CS")+2))
Then build another query to join first query to table.
SELECT qryProjectEmailWithID.*, SupportProject.tst
FROM qryProjectEmailWithID
INNER JOIN SupportProject ON qryProjectEmailWithID.ID = SupportProject.ID;
Filter criteria can be applied to either ID field.
A subform can display the related child records synchronized with SupportProject records on main form.
I tested the ID calc with your data and then with a link to my Inbox. No issue with query join.

SQLAlchemy getting label names out from columns

I want to use the same labels from a SQLAlchemy table, to re-aggregate some data (e.g. I want to iterate through mytable.c to get the column names exactly).
I have some spending data that looks like the following:
| name | region | date | spending |
| John | A | .... | 123 |
| Jack | A | .... | 20 |
| Jill | B | .... | 240 |
I'm then passing it to an existing function we have, that aggregates spending over 2 periods (using a case statement) and groups by region:
grouped table:
| Region | Total (this period) | Total (last period) |
| A | 3048 | 1034 |
| B | 2058 | 900 |
The function returns a SQLAlchemy query object that I can then use subquery() on to re-query e.g.:
subquery = get_aggregated_data(original_table)
region_A_results = session.query(subquery).filter(subquery.c.region = 'A')
I want to then re-aggregate this subquery (summing every column that can be summed, replacing the region column with a string 'other'.
The problem is, if I iterate through subquery.c, I get labels that look like:
anon_1.region
anon_1.sum_this_period
anon_1.sum_last_period
Is there a way to get the textual label from a set of column objects, without the anon_1. prefix? Especially since I feel that the prefix may change depending on how SQLAlchemy decides to generate the query.
Split the name string and take the second part, and if you want to prepare for the chance that the name is not prefixed by the table name, put the code in a try - except block:
for col in subquery.c:
try:
print(col.name.split('.')[1])
except IndexError:
print(col.name)
Also, the result proxy (region_A_results) has a method keys which returns an a list of column names. Again, if you don't need the table names, you can easily get rid of them.

Creating an SSIS job to split a column and insert into database

I have a column called Description:
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Description/Title |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Liszt, Hungarian Rhapsody #6 {'Pesther Carneval'}; 2 Episodes from Lenau's 'Faust'; 'Hunnenschlacht' Symphonic Poem. (NW German Phil./ Kulka) |
| Beethoven, Piano Sonatas 8, 23 & 26. (Justus Frantz) |
| Puccini, Verdi, Gounod, Bizet: Arias & Duets from Butterfly, Tosca, Boheme, Turandot, I Vespri, Faust, Carmen. (Fiamma Izzo d'Amico & Peter Dvorsky w.Berlin Radio Symph./Paternostro) |
| Puccini, Ponchielli, Bizet, Tchaikovsky, Donizetti, Verdi: Arias from Boheme, Manon Lescaut, Tosca, Gioconda, Carmen, Eugen Onegin, Favorita, Rigoletto, Luisa Miller, Ballo, Aida. (Peter Dvorsky, ten. w.Hungarian State Opera Orch./ Mihaly) |
| Thomas, Leslie: 'The Virgin Soldiers' (Hywel Bennett reads abridged version. Listening time app. 2 hrs. 45 mins. DOLBY) |
| Katalsky, A. {1856-1926}: Liturgy for A Cappella Chorus. Rachmaninov, 6 Choral Songs w.Piano. (Bolshoi Theater Children's Choir/ Zabornok. DOLBY) |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Please note that above I'm only showing 1 field.
Also, the output that I would like is:
+-------+-------+
| Word | Count |
+-------+-------+
| Arias | 3 |
| Duets | 2 |
| Liszt | 10 |
| Tosca | 1 |
+-------+-------+
I want this output to encompass EVERY record. I do not want a separate one of these for each record, just one global one.
I am choosing to use SSIS to do this job. I'd like your input on which controls to use to help with this task:
I'm not looking for a solution, but simply some direction on how to get started with this. I understand this can be done many different ways, but I cannot seem to think of a way to do this most efficiently. Thank you for any guidance.
FYI:
This script does an excellent job of concatenating everything:
select description + ', ' as 'data()'
from [BroincInventory]
for xml path('')
But I need guidance on how to work with this result to create the required output. How can this be done with c# or with one of the SSIS components?
edit: As siyual points out below I need a script task. The script above obviously will not work since there's a limit to the size of a data point.
I think term extraction might be the component you are looking for. Check this out: http://www.mssqltips.com/sqlservertip/3194/simple-text-mining-with-the-ssis-term-extraction-component/

How to represent and insert into an ordered list in SQL?

I want to represent the list "hi", "hello", "goodbye", "good day", "howdy" (with that order), in a SQL table:
pk | i | val
------------
1 | 0 | hi
0 | 2 | hello
2 | 3 | goodbye
3 | 4 | good day
5 | 6 | howdy
'pk' is the primary key column. Disregard its values.
'i' is the "index" that defines that order of the values in the 'val' column. It is only used to establish the order and the values are otherwise unimportant.
The problem I'm having is with inserting values into the list while maintaining the order. For example, if I want to insert "hey" and I want it to appear between "hello" and "goodbye", then I have to shift the 'i' values of "goodbye" and "good day" (but preferably not "howdy") to make room for the new entry.
So, is there a standard SQL pattern to do the shift operation, but only shift the elements that are necessary? (Note that a simple "UPDATE table SET i=i+1 WHERE i>=3" doesn't work, because it violates the uniqueness constraint on 'i', and also it updates the "howdy" row unnecessarily.)
Or, is there a better way to represent the ordered list? I suppose you could make 'i' a floating point value and choose values between, but then you have to have a separate rebalancing operation when no such value exists.
Or, is there some standard algorithm for generating string values between arbitrary other strings, if I were to make 'i' a varchar?
Or should I just represent it as a linked list? I was avoiding that because I'd like to also be able to do a SELECT .. ORDER BY to get all the elements in order.
As i read your post, I kept thinking 'linked list'
and at the end, I still think that's the way to go.
If you are using Oracle, and the linked list is a separate table (or even the same table with a self referencing id - which i would avoid) then you can use a CONNECT BY query and the pseudo-column LEVEL to determine sort order.
You can easily achieve this by using a cascading trigger that updates any 'index' entry equal to the new one on the insert/update operation to the index value +1. This will cascade through all rows until the first gap stops the cascade - see the second example in this blog entry for a PostgreSQL implementation.
This approach should work independent of the RDBMS used, provided it offers support for triggers to fire before an update/insert. It basically does what you'd do if you implemented your desired behavior in code (increase all following index values until you encounter a gap), but in a simpler and more effective way.
Alternatively, if you can live with a restriction to SQL Server, check the hierarchyid type. While mainly geared at defining nested hierarchies, you can use it for flat ordering as well. It somewhat resembles your approach using floats, as it allows insertion between two positions by assigning fractional values, thus avoiding the need to update other entries.
If you don't use numbers, but Strings, you may have a table:
pk | i | val
------------
1 | a0 | hi
0 | a2 | hello
2 | a3 | goodbye
3 | b | good day
5 | b1 | howdy
You may insert a4 between a3 and b, a21 between a2 and a3, a1 between a0 and a2 and so on. You would need a clever function, to generate an i for new value v between p and n, and the index can become longer and longer, or you need a big rebalancing from time to time.
Another approach could be, to implement a (double-)linked-list in the table, where you don't save indexes, but links to previous and next, which would mean, that you normally have to update 1-2 elements:
pk | prev | val
------------
1 | 0 | hi
0 | 1 | hello
2 | 0 | goodbye
3 | 2 | good day
5 | 3 | howdy
hey between hello & goodbye:
hey get's pk 6,
pk | prev | val
------------
1 | 0 | hi
0 | 1 | hello
6 | 0 | hi <- ins
2 | 6 | goodbye <- upd
3 | 2 | good day
5 | 3 | howdy
the previous element would be hello with pk=0, and goodbye, which linked to hello by now has to link to hey in future.
But I don't know, if it is possible to find a 'order by' mechanism for many db-implementations.
Since I had a similar problem, here is a very simple solution:
Make your i column floats, but insert integer values for the initial data:
pk | i | val
------------
1 | 0.0 | hi
0 | 2.0 | hello
2 | 3.0 | goodbye
3 | 4.0 | good day
5 | 6.0 | howdy
Then, if you want to insert something in between, just compute a float value in the middle between the two surrounding values:
pk | i | val
------------
1 | 0.0 | hi
0 | 2.0 | hello
2 | 3.0 | goodbye
3 | 4.0 | good day
5 | 6.0 | howdy
6 | 2.5 | hey
This way the number of inserts between the same two values is limited to the resolution of float values but for almost all cases that should be more than sufficient.