AQL can not find row by PK. Row is there when selecting - aerospike

I have a namespace and a set in aerospike with information inside.
After performing a SELECT * FROM mytecache.search_results I get around 600 rows of this:
"bdee9a37e3217f28a057b5e0ebbef43f_Sabre_13" | "a:11:{s:5:"price";a:14:{s:5:"total";d:2947.5999999999999;s:11:"pricePerPax";d:1473.8;s:8:"totalflt";d:2947.5999999999999;s:9:"baseprice";d:1901.0799999999999;s:3:"tax";d:986.51999999999998;s:10:"servicefee";i:60;s:11:"markupprice";d:1961.0799999999999;s | "flight_bdee9a37e3217f28a057b5e0ebbef43f" | 147380 | LIST('["LH", "LH", "LH", "LH"]'....
As you know, the first column is the primary key, so I try to select the row (which just appeared in SELECT) with:
aql> SELECT * FROM mytecache.search_results WHERE PK = "bdee9a37e3217f28a057b5e0ebbef43f_Sabre_13"
and I get:
Error: (2) AEROSPIKE_ERR_RECORD_NOT_FOUND
Explain returns this:
aql> EXPLAIN SELECT * FROM mytecache.search_results WHERE PK="bdee9a37e3217f28a057b5e0ebbef43f_Sabre_13"
+------------------+--------------------------------------------+-------------+-----------+--------+---------+----------+----------------------------+-------------------+-------------------------+---------+
| SET | DIGEST | NAMESPACE | PARTITION | STATUS | UDF | KEY_TYPE | POLICY_REPLICA | NODE | POLICY_KEY | TIMEOUT |
+------------------+--------------------------------------------+-------------+-----------+--------+---------+----------+----------------------------+-------------------+-------------------------+---------+
| "search_results" | "6F62BE5323F0A51EF7DFDD5060A02E62CD2453F0" | "mytecache" | 623 | 2 | "FALSE" | "STRING" | "AS_POLICY_REPLICA_MASTER" | "BB9B25B63000056" | "AS_POLICY_KEY_DEFAULT" | 1000 |
+------------------+--------------------------------------------+-------------+-----------+--------+---------+----------+----------------------------+-------------------+-------------------------+---------+
1 row in set (0.002 secs)
asinfo outputs:
1 : node
BB9B25B63000056
2 : statistics
cluster_size=1;cluster_key=883272F81547;cluster_integrity=true;cluster_is_member=true;uptime=1146;system_free_mem_pct=90;system_swapping=false;heap_allocated_kbytes=6525778;heap_active_kbytes= 6529792;heap_mapped_kbytes=6619136;heap_efficiency_pct=99;heap_site_count=14;objects=6191;tombstones=0;tsvc_queue=0;info_queue=0;delete_queue=0;rw_in_progress=0;proxy_in_progress=0;tree_gc_queue=0; client_connections=19;heartbeat_connections=0;fabric_connections=0;heartbeat_received_self=7637;heartbeat_received_foreign=0;reaped_fds=47;info_complete=12326;proxy_retry=0;demarshal_error=0;early_ tsvc_client_error=0;early_tsvc_batch_sub_error=0;early_tsvc_udf_sub_error=0;batch_index_initiate=0;batch_index_queue=0:0,0:0,0:0,0:0;batch_index_complete=0;batch_index_error=0;batch_index_timeout=0 ;batch_index_unused_buffers=0;batch_index_huge_buffers=0;batch_index_created_buffers=0;batch_index_destroyed_buffers=0;batch_initiate=0;batch_queue=0;batch_error=0;batch_timeout=0;scans_active=0;qu ery_short_running=0;query_long_running=0;sindex_ucgarbage_found=0;sindex_gc_locktimedout=0;sindex_gc_list_creation_time=39;sindex_gc_list_deletion_time=0;sindex_gc_objects_validated=11734;sindex_gc _garbage_found=0;sindex_gc_garbage_cleaned=0;paxos_principal=BB9B25B63000056;migrate_allowed=true;migrate_partitions_remaining=0;fabric_bulk_send_rate=0;fabric_bulk_recv_rate=0;fabric_ctrl_send_rat e=0;fabric_ctrl_recv_rate=0;fabric_meta_send_rate=0;fabric_meta_recv_rate=0;fabric_rw_send_rate=0;fabric_rw_recv_rate=0
3 : features
peers;cdt-list;cdt-map;pipelining;geo;float;batch-index;replicas-all;replicas-master;replicas-prole;udf
4 : cluster-generation
1
5 : partition-generation
0
6 : build_time
Sat Nov 4 02:19:49 UTC 2017
7 : edition
Aerospike Community Edition
8 : version
Aerospike Community Edition build 3.15.0.2
9 : build
3.15.0.2
10 : services
11 : services-alumni
12 : build_os
debian8
Which seems wrong. Why is the query not returning the result, when it is clearly there ?

"As you know, the first column is the primary key," -- no first column is your first bin. The primary key is whatever you used to write the record. Aerospike does not store your primary key.
You are quite likely using a wrong PK for the record whose first bin is "bdee9a37e3217f28a057b5e0ebbef43f_Sabre_13"
https://www.aerospike.com/docs/tools/aql/data_management.html
“STATUS” is the status of the operation. It will be the code as returned by the client for the record. E.g: 0 is AEROSPIKE_OK and 2 is AEROSPIKE_ERR_RECORD_NOT_FOUND.
So, even your explain execution in AQL is telling you in the STATUS that the record is not found.

Related

Longest Common Prefix Per Aggregate Using Apache Spark SQL

I am using apache spark to find the longest common prefix per session
Given the following example:
session | prefix
_____________________
1 | keys
1 | key chain
1 | keysmith
2 | tim
2 | timmy
2 | tim hortons
I would like to format this into the following output:
session | prefix
_____________________
1 | key
2 | tim
I saw an example which checks a column in one row against all others but I have trouble wrapping my head around how to do this for aggregate rows.
Any help is appreciated!
try like below
select session,min(length(prefix)) from table_name
group by session

Find current data set using two SQL tables storing separately historical insertions and deletions

Problem
I need to do daily syncs of our latest internal data to an external audit database that does not offer an update interface. In order to update some records, I need to first generate and send in a deletion file to remove those records, and then follow by an insertion file with the same but updated records in it.
An important detail is that all of the records in deletion files must match the external records verbatim, in order to be deleted.
Proposed approach
Currently I use two separate SQL tables to version control what I have inserted/deleted.
Let's say that right now the inserted_records table looks like this:
id | file_version | contract_id | customer_name | start_year
9 | 6 | 1 | Alice | 2015
10 | 6 | 2 | Bob | 2015
11 | 6 | 3 | Charlie | 2015
Accompanied by a separate and empty deleted_records table with identical columns.
Now, if I want to
change the customer_name from Alice to Dave on line id 9
change the start_year for Bob from 2015 to 2020 on line id 10
Two new lines in inserted_records would be generated, line 12 and 13, in turn creating a new insertion file 7.
id | file_version | contract_id | customer_name | start_year
9 | 6 | 1 | Alice | 2015
10 | 6 | 2 | Bob | 2015
11 | 6 | 3 | Charlie | 2015
12 | 7 | 1 | Dave | 2015
13 | 7 | 2 | Bob | 2020
Then their original column values in line 9 and 10 are then copied onto the previously empty deleted_records, in turn creating a new deletion file 1.
id | file_version | contract_id | customer_name | start_year
1 | 1 | 1 | Alice | 2015
2 | 1 | 2 | Bob | 2015
Now, if I were to send in the deletion file 1 first followed by the insertion file 7, I would get the result that I wanted.
Question
How can I query the current set of records, considering all insertions and deletions that have occurred? Assuming all records in deleted_records always have matches in inserted_records and if multiple, we always delete records with smaller file version numbers first.
I have tried by first writing one to query the inserted_records for the latest records grouped by contract_id.
select top 1 with ties *
from insertion_record
order by row_number() over (partition by contract_id order by file_version desc)
This would give me line 11, 12 and 13, which is what I wanted in this particular example. But if we also wanted to delete the record line 11 with Charlie, then my query wouldn't work anymore as it doesn't take deleted_records into account, and I have no idea how to do it in SQL.
Furthermore, my nut tells me that this approach isn't solid as there are two separate and moving parts, perhaps there is a better approach to solve this?
How can I query the current set of records
I don't understand your question. Every SQL query is against the current set of records, if by that you mean the data currently in the database.
I do see a couple of problems.
Unless the table you're deleting from has a key defined, even an exact match on every column risks deleting more than one row.
You're performing an ad hoc update with UPDATE's transaction guarantee. I suppose the table you're updating is otherwise idle, and as a practical matter you don't have to worry about someone else (or you) re-inserting the deleted rows before your inserts arrive. But it's problem waiting to happen.
If what you're trying to do is produce the set of rows that will be the result of a series of inserts and deletions, you haven't provided enough information to say how that could be done, or even if it's possible. There would have to be some way to uniquely identify rows, so that deletions and insertions can be associated. (They don't match on all columns, after all.) And you'd need some indication of order of operation, because it matters whether INSERT follows or precedes DELETE.

Find referenced value of multiple columns

I have a table Setpoints which contains 3 columns Base,Effective and Actual which contains an id that refers to the item found in io.
I would like to make a query that will return the io_value found in the io table for the id found in Setpoints.
Currently my query will return multiple id's and then I query the io table to find the io_value for each id.
Ex Query returning the ID's in the row
row # | base | effective | actual
1 | 24 | 30 | 40
2 | 25 | 31 | 41
3 | 26 | 32 | 42
But i want it return the value instead of the id
Ex returning the value for the id's instead
row # | base | effective | actual
1 | 2.3 | 4.5 | 3.44
2 | 4.2 | 7.7 | 4.41
3 | 3.9 | 8.12 | 5.42
Here are the table fields
IO
io_value
io_id
Setpoints
stpt_base
stpt_effective
stpt_actual
Using postgres 9.5
What Im using now
SELECT * from setpoints
For each row
SELECT io_id, io_value
from io
where io_id in
(stpt_effective, stpt_actual, stpt_base);
// these are from previous query
You can solve this by joining the io table three times to the setpoints table, using the three columns in each individual JOIN:
SELECT a.io_value AS base,
b.io_value AS effective,
c.io_value AS actual
FROM setpoints s
JOIN io a ON a.io_id = s.stpt_base
JOIN io b ON b.io_id = s.stpt_effective
JOIN io c ON c.io_id = s.stpt_actual;

SQL group by one column, sort by another and transponse a third

I have the following table, which is actually the minimal example of the result of multiple joined tables. I now would like to group by 'person_ID' and get all the 'value' entries in one row, sorted after the feature_ID.
person_ID | feature_ID | value
123 | 1 | 1.1
123 | 2 | 1.2
123 | 3 | 1.3
123 | 4 | 1.2
124 | 1 | 1.0
124 | 2 | 1.1
...
The result should be:
123 | 1.1 | 1.2 | 1.3 | 1.2
124 | 1.0 | 1.1 | ...
There should exist an elegant SQL query solution, which I can neither come up with, nor find it.
For fast reconstruction that would be the example data:
create table example(person_ID integer, feature_ID integer, value float);
insert into example(person_ID, feature_ID, value) values
(123,1,1.1),
(123,2,1.2),
(123,3,1.3),
(123,4,1.2),
(124,1,1.0),
(124,2,1.1),
(124,3,1.2),
(124,4,1.4);
Edit: Every person has 6374 entries in the real life application.
I am using a PostgreSQL 8.3.23 database, but I think that should probably be solvable with standard SQL.
Data bases aren't much at transposing. There is a nebulous column growth issue at hand, I mean how does the data base deal with a variable number of columns? It's not a spread sheet.
This transposing of sorts is normally done in the report writer, not in SQL.
... or in a program, like in php.
Dynamic cross tab in sql only by procedure, see:
https://www.simple-talk.com/sql/t-sql-programming/creating-cross-tab-queries-and-pivot-tables-in-sql/

How to represent and insert into an ordered list in SQL?

I want to represent the list "hi", "hello", "goodbye", "good day", "howdy" (with that order), in a SQL table:
pk | i | val
------------
1 | 0 | hi
0 | 2 | hello
2 | 3 | goodbye
3 | 4 | good day
5 | 6 | howdy
'pk' is the primary key column. Disregard its values.
'i' is the "index" that defines that order of the values in the 'val' column. It is only used to establish the order and the values are otherwise unimportant.
The problem I'm having is with inserting values into the list while maintaining the order. For example, if I want to insert "hey" and I want it to appear between "hello" and "goodbye", then I have to shift the 'i' values of "goodbye" and "good day" (but preferably not "howdy") to make room for the new entry.
So, is there a standard SQL pattern to do the shift operation, but only shift the elements that are necessary? (Note that a simple "UPDATE table SET i=i+1 WHERE i>=3" doesn't work, because it violates the uniqueness constraint on 'i', and also it updates the "howdy" row unnecessarily.)
Or, is there a better way to represent the ordered list? I suppose you could make 'i' a floating point value and choose values between, but then you have to have a separate rebalancing operation when no such value exists.
Or, is there some standard algorithm for generating string values between arbitrary other strings, if I were to make 'i' a varchar?
Or should I just represent it as a linked list? I was avoiding that because I'd like to also be able to do a SELECT .. ORDER BY to get all the elements in order.
As i read your post, I kept thinking 'linked list'
and at the end, I still think that's the way to go.
If you are using Oracle, and the linked list is a separate table (or even the same table with a self referencing id - which i would avoid) then you can use a CONNECT BY query and the pseudo-column LEVEL to determine sort order.
You can easily achieve this by using a cascading trigger that updates any 'index' entry equal to the new one on the insert/update operation to the index value +1. This will cascade through all rows until the first gap stops the cascade - see the second example in this blog entry for a PostgreSQL implementation.
This approach should work independent of the RDBMS used, provided it offers support for triggers to fire before an update/insert. It basically does what you'd do if you implemented your desired behavior in code (increase all following index values until you encounter a gap), but in a simpler and more effective way.
Alternatively, if you can live with a restriction to SQL Server, check the hierarchyid type. While mainly geared at defining nested hierarchies, you can use it for flat ordering as well. It somewhat resembles your approach using floats, as it allows insertion between two positions by assigning fractional values, thus avoiding the need to update other entries.
If you don't use numbers, but Strings, you may have a table:
pk | i | val
------------
1 | a0 | hi
0 | a2 | hello
2 | a3 | goodbye
3 | b | good day
5 | b1 | howdy
You may insert a4 between a3 and b, a21 between a2 and a3, a1 between a0 and a2 and so on. You would need a clever function, to generate an i for new value v between p and n, and the index can become longer and longer, or you need a big rebalancing from time to time.
Another approach could be, to implement a (double-)linked-list in the table, where you don't save indexes, but links to previous and next, which would mean, that you normally have to update 1-2 elements:
pk | prev | val
------------
1 | 0 | hi
0 | 1 | hello
2 | 0 | goodbye
3 | 2 | good day
5 | 3 | howdy
hey between hello & goodbye:
hey get's pk 6,
pk | prev | val
------------
1 | 0 | hi
0 | 1 | hello
6 | 0 | hi <- ins
2 | 6 | goodbye <- upd
3 | 2 | good day
5 | 3 | howdy
the previous element would be hello with pk=0, and goodbye, which linked to hello by now has to link to hey in future.
But I don't know, if it is possible to find a 'order by' mechanism for many db-implementations.
Since I had a similar problem, here is a very simple solution:
Make your i column floats, but insert integer values for the initial data:
pk | i | val
------------
1 | 0.0 | hi
0 | 2.0 | hello
2 | 3.0 | goodbye
3 | 4.0 | good day
5 | 6.0 | howdy
Then, if you want to insert something in between, just compute a float value in the middle between the two surrounding values:
pk | i | val
------------
1 | 0.0 | hi
0 | 2.0 | hello
2 | 3.0 | goodbye
3 | 4.0 | good day
5 | 6.0 | howdy
6 | 2.5 | hey
This way the number of inserts between the same two values is limited to the resolution of float values but for almost all cases that should be more than sufficient.