How do you return a specfic column value of a certain row in an existing table within a database? - sql

The Problem:
I'm working in PostgreSQL 9.0 and I'm having a difficult time figuring out how to tackle the situation where you want to return a specific column value of a certain row for use in a CASE WHEN THEN statement.
I want to basically go in and set the value of table A: someRow's someColumn value, equal to the value of table B: row X's column A value, given the value of row X's column B. (More detail in "Backround Info" if needed to understand the question)
This is what I want to do (but don't know how):
Update tableA
Set someColumn
CASE WHEN given_info_column = 'tableB: row X's column B value'
THEN (here I want to return row X's column A value, finding row X using the given column B value)
ELSE someColumn END
Background Info: (Optional, for clarification)
Imagine that there is a user activity table, and a device table in an already existing database, with already existing activity performed strings that exist throughout to codebase you are working in: (for example)
User_Activity:
id (int) | user_name (string) | activity_preformed (string) | category (string)
---------|-----------------------|----------------------------------------|------------------
1 | Joe Martinez | checked out iphone: iphone2 | dvc_activity
2 | Jon Shmoe | uploads video from device: (id: 12345) | dvc_activity
3 | Larry David | goes to the bathroom |other_activity
Device:
seq (int)| device_name (string) | device_srl_num (int) | device_status (string)|
---------+-----------------------+----------------------+-----------------------+
1 | iphone1 | 12344 | available
2 | iphone2 | 12345 | checked out
3 | android1 | 23456 | available
Your assignment from your boss is to create a report that shows one table with all device activity, like so:
Device Activity Report
(int) (int) (string) (string) (string) (int) (string)
act_seq |usr_id | usr_name | act_performed | dvc_name | dvc_srl_num | dvc_status
---------+-------+--------------+---------------------------------------+-----------+-------------+------------
1 |1 | Joe Martinez | Checked out iphone: iphone2 | iphone2 | 12345 | checked out
2 |2 | John Shmoe | uploads video from device: (id: 12345)| android1 | 23456 | available
For the purposes of this question, this has to be done by adding a new column to the user activity table called dvc_seq which will be a foreign key to the device table. You will create a temporary table by querying from the user activity table and joining the two where User_Activity (dvc_seq) = Device (seq)
This is fine and will work great for new entries into the User_Activity table, which will record a dvc_seq linking to the associated device if the activity involves a device.
The problem is that you need to go in and fill in values for the new dvc_seq column in the User_Activity table for all previous entries relating to devices. Since the previous programmers decided to specify which device in the activity_performed column using the serial number certain times and the device names other times, this presents an interesting problem, where you will need to derive the associated Device seq number from a device, given its name or serial number.
So once again, what I want to do: (using this example)
UPDATE User_Activity
SET dvc_seq
CASE WHEN activity_performed LIKE 'checked out iphone:%'
THEN (seq column of Device table)
WHERE (SELECT 1 FROM Device WHERE device_name = (substring in place of the %))
ELSE dvc_seq (I think this would be null since there would be nothing here yet)
END
Can any of you help me accomplish this?? Thanks in advance for all responses and advice!

The query below uses an update-join to update the sequence number when the serial number or the name is contained within the activity_performed
UPDATE UserActivity
SET a.dvc_seq = b.seq
FROM UserActivity AS a
JOIN devices b
ON UserActivity.activity_performed LIKE '%from device: (id: ' || b.serial_num || '%'
OR UserActivity.activity_performed LIKE '%: ' || b.name || '%'

Just an additional update on how to speed up this code based off of the correct answer given by #FuzzyTree (this would only work for the serial number, which has a standard length, and not for the device name which could be many different sizes)
Because of the LIKE used in the join, the query runs very slow for large databases. an even better solution would utilize the postgres substring() and position() functions and join the tables on the serial number like so:
UPDATE UserActivity
SET a.dvc_seq = b.seq
FROM UserActivity AS a
JOIN devices b
ON b.serial_num =
substring(activity_performed from position('%from device: (id: ' in activity_performed)+(length of the string before the plus so that position returns the start position for the serial number)) for (lengthOfSerialNumberHere))
WHERE UserActivity.activity_performed LIKE '%from device: (id: ' || b.serial_num || '%';`

In postgresql you can't do a complex CASE expresion like
CASE WHEN activity_performed LIKE 'checked out iphone:%'
only
CASE WHEN 1, 2
The best you can do is create a function
UPDATE User_Activity
SET dvc_seq = getDeviceID(User_Activity.activity_preformed);
Here you can do IF, CASE much easier
CREATE OR REPLACE FUNCTION getDeviceID(activity text)
RETURNS integer AS
$BODY$
DECLARE
device_name text;
device_id integer;
BEGIN
-- parse the string activity
/* this part is pseudo code
set device_id;
IF (device_id is null)
set device_name;
search for device_id using device_name;
set device_id;
*/
RETURN device_id;
END;
$BODY$
LANGUAGE plpgsql VOLATILE
COST 100;
ALTER FUNCTION get_box(text)
OWNER TO postgres;

Related

Issue displaying empty value of repeated columns in Google Data Studio

I've got an issue when trying to visualize in Google Data Studio some information from a denormalized table.
Context: I want to gather all the contact of a company and there related orders in a table in Big Query. Contacts can have no order or multiple orders. Following Big Query best practice, this table is denormalized and all the orders for a client are in arrays of struct. It looks like this:
Fields Examples:
+-------+------------+-------------+-----------+
| Row # | Contact_Id | Orders.date | Orders.id |
+-------+------------+-------------+-----------+
|- 1 | 23 | 2019-02-05 | CB1 |
| | | 2020-03-02 | CB293 |
|- 2 | 2321 | - | - |
|- 3 | 77 | 2010-09-03 | AX3 |
+-------+------------+-------------+-----------+
The issue is when I want to use this table as a data source in Data Studio.
For instance, if I build a table with Contact_Id as dimension, everything is fine and I can see all my contacts. However, if I add any dimensions from the Orders struct, all info from contact with no orders are not displayed. For instance, all info from Contact_Id 2321 is removed from the table.
Have you find any workaround to visualize these empty arrays (for instance as null values)?
The only solution I've found is to build an intermediary table with the orders unnested.
The way I've just discovered to work around this is to add an extra field in my DS-> BQ connector:
ARRAY_LENGTH(fields.orders) AS numberoforders
This will return zero if the array is empty - you can then create calculated fields within DataStudio - using the "numberoforders" field to force values to NULL or zero.
You can fix this behaviour by changing a little your query on the BigQuery connector.
Instead of doing this:
SELECT
Contact_id,
Orders
FROM myproject.mydataset.mytable
try this:
SELECT
Contact_id,
IF(ARRAY_LENGTH(Orders) > 0, Orders, [STRUCT(CAST(NULL AS DATE) AS date, CAST(NULL AS STRING) AS id)]) AS Orders
FROM myproject.mydataset.mytable
This way you are forcing your repeated field to have, at least, an array containing NULL values and hence Data Studio will represent those missing values.
Also, if you want to create new calculated fields using one of the nested fields, you should check before if the value is NULL to avoid filling all NULL values. For example, if you have a repeated and nested field which can be 1 or 0, and you want to create a calculated field swaping the value, you should do:
IF(myfield.key IS NOT NULL, IF(myfield.key = 1, 0, 1), NULL)
Here you can see what happens if you check before swaping and if you don't:
Original value No check Check
1 0 0
0 1 1
NULL 1 NULL
1 0 0
NULL 1 NULL

How to do an exact match followed by ORDER BY in PostgreSQL

I'm trying to write a query that puts some results (in my case a single result) at the top, and then sorts the rest. I have yet to find a PostgreSQL solution.
Say I have a table called airports like so.
id | code | display_name
----+------+----------------------------
1 | SDF | International
2 | INT | International Airport
3 | TES | Test
4 | APP | Airport Place International
In short, I have a query in a controller method that gets called asynchronously when a user text searches for an airport either by code or display_name. However, when a user types in an input that matches a code exactly (airport code is unique), I want that result to appear first, and all airports that also have int in their display_name to be displayed afterwards in ascending order. If there is no exact match, it should return any wildcard matches sorted by display_name ascending. So if a user types in INT, The row (2, INT, International Airport) should be returned first followed by the others:
Results:
1. INT | International Airport
2. APP | Airport Place International
3. SDF | International
Here's the kind of query I was tinkering with that is slightly simplified to make sense outside the context of my application but same concept nonetheless.
SELECT * FROM airports
WHERE display_name LIKE 'somesearchtext%'
ORDER BY (CASE WHEN a.code = 'somesearchtext` THEN a.code ELSE a.display_name END)
Right now the results if I type INT I'm getting
Results:
1. APP | Airport Place International
2. INT | International Airport
3. SDF | International
My ORDER BY must be incorrect but I can't seem to get it
Any help would be greatly appreciated :)
If you want an exact match on code to return first, then I think this does the trick:
SELECT a.*
FROM airports a
WHERE a.display_name LIKE 'somesearchtext%'
ORDER BY (CASE WHEN a.code = 'somesearchtext' THEN 1 ELSE 2 END),
a.display_name
You could also write this as:
ORDER BY (a.code = 'somesearchtext') DESC, a.display_name
This isn't standard SQL, but it is quite readable.
I think you can achieve your goal by using a UNION.
First get an exact match and then add that result to rest of the data as you which.
e.g.. (you will need to work in this a bit)
SELECT * FROM airports
WHERE code == 'somesearchtext'
ORDER BY display_name
UNION
SELECT * FROM airports
WHERE code != 'somesearchtext' AND display_name LIKE 'somesearchtext%'
ORDER BY display_name

Single record buffering in SAP ABAP

My table is stud.
+-----+------+-------+
| no | name | grade |
+-----+------+-------+
| 101 | naga | A |
| 102 | raj | A |
| 103 | john | A |
+-----+------+-------+
The query I'm using is:
SELECT * FROM stud WHERE no = 101 AND grade = 'A'.
If am using single record buffering, how much data is being stored in the buffer area?
This query doesn't do anything. There is no "into" clause. meaning it wont store anything selected.
You are probably looking to do something like this....
SELECT * FROM stud into wa_stud WHERE no = 101 AND grade = 'A'.
"processing of each single row is performed here
endselect.
or perhaps something like this, where only 1 row (the first rows ordered by primary key) is selected...
select single * from stud into wa_stud where no = 101 and grade = 'A' .
or perhaps you want everything brought in to a table, meaning number and grade does not include the full primary key.
select * from stud into table it_stud where no = 101 and grade = 'A'.
this is from ABAP Keyword documentation in SE38:
SAP Buffer - Single Record Buffering
Only those rows in the table are buffered that are actually accessed.
This requires less space in the buffer than when using generic or full
buffering. On the other hand, more administration work is required and
significantly more direct database accesses.
So since your query returns a single record (based on the data you displayed) it should just get one row and hold in the buffer.
I'd suggest looking at SAP help and Google - also have a look at SELECT SINGLE and incompletely specified keys - there used to be a problem with the buffer being bypassed in some situations - have a read for reference.

Query with conditions on multiple value column

I am building report in Oracle Apex 4.2. Table that report is build on has multiple values inside one of the columns.
-----------------------------------
| ID | NAME | PROJECT_ID |
-----------------------------------
| 1 | P1 | 23:45:56 |
| 2 | P2 | 23 |
| 3 | P3 | 45:65 |
-----------------------------------
I would like to build a query to retrieve names based on project_id's.
Select name from table where project_id = 23;
This obviously will return P2 only however I would like to build a query which would return P1 and P2 if we searched for 23.
Any help greatly appreciated.
You can use LIKE instead of = :
Select name from table where project_id LIKE '%23%';
If you've got a common delimiter such as the ':' in your example you could use the following to exclude results like '123':
SELECT name FROM table WHERE ':' || project_id || ':' LIKE '%:23:%'
By concatenating the delimiter to the front and back of the string, you don't have to write multiple criteria: LIKE '23:%' OR LIKE '%:23:%' OR LIKE '%:23' to handle the first and last number in the list.
This is a common design in Apex due to its builtin support for colon-delimited strings (e.g. to drive shuttle controls and other item types).
I generally use this pattern:
Select name from table where INSTR(':'||project_id||':',':23:') > 0;
P.S. It's a pity about that column name - I would have called it something like PROJECT_ID_LIST.

cloning hierarchical data

let's assume i have a self referencing hierarchical table build the classical way like this one:
CREATE TABLE test
(name text,id serial primary key,parent_id integer
references test);
insert into test (name,id,parent_id) values
('root1',1,NULL),('root2',2,NULL),('root1sub1',3,1),('root1sub2',4,1),('root
2sub1',5,2),('root2sub2',6,2);
testdb=# select * from test;
name | id | parent_id
-----------+----+-----------
root1 | 1 |
root2 | 2 |
root1sub1 | 3 | 1
root1sub2 | 4 | 1
root2sub1 | 5 | 2
root2sub2 | 6 | 2
What i need now is a function (preferrably in plain sql) that would take the id of a test record and
clone all attached records (including the given one). The cloned records need to have new ids of course. The desired result
would like this for example:
Select * from cloningfunction(2);
name | id | parent_id
-----------+----+-----------
root2 | 7 |
root2sub1 | 8 | 7
root2sub2 | 9 | 7
Any pointers? Im using PostgreSQL 8.3.
Pulling this result in recursively is tricky (although possible). However, it's typically not very efficient and there is a much better way to solve this problem.
Basically, you augment the table with an extra column which traces the tree to the top - I'll call it the "Upchain". It's just a long string that looks something like this:
name | id | parent_id | upchain
root1 | 1 | NULL | 1:
root2 | 2 | NULL | 2:
root1sub1 | 3 | 1 | 1:3:
root1sub2 | 4 | 1 | 1:4:
root2sub1 | 5 | 2 | 2:5:
root2sub2 | 6 | 2 | 2:6:
root1sub1sub1 | 7 | 3 | 1:3:7:
It's very easy to keep this field updated by using a trigger on the table. (Apologies for terminology but I have always done this with SQL Server). Every time you add or delete a record, or update the parent_id field, you just need to update the upchain field on that part of the tree. That's a trivial job because you just take the upchain of the parent record and append the id of the current record. All child records are easily identified using LIKE to check for records with the starting string in their upchain.
What you're doing effectively is trading a bit of extra write activity for a big saving when you come to read the data.
When you want to select a complete branch in the tree it's trivial. Suppose you want the branch under node 1. Node 1 has an upchain '1:' so you know that any node in the branch of the tree under that node must have an upchain starting '1:...'. So you just do this:
SELECT *
FROM table
WHERE upchain LIKE '1:%'
This is extremely fast (index the upchain field of course). As a bonus it also makes a lot of activities extremely simple, such as finding partial trees, level within the tree, etc.
I've used this in applications that track large employee reporting hierarchies but you can use it for pretty much any tree structure (parts breakdown, etc.)
Notes (for anyone who's interested):
I haven't given a step-by-step of the SQL code but once you get the principle, it's pretty simple to implement. I'm not a great programmer so I'm speaking from experience.
If you already have data in the table you need to do a one time update to get the upchains synchronised initially. Again, this isn't difficult as the code is very similar to the UPDATE code in the triggers.
This technique is also a good way to identify circular references which can otherwise be tricky to spot.
The Joe Celko's method which is similar to the njreed's answer but is more generic can be found here:
Nested-Set Model of Trees (at the middle of the article)
Nested-Set Model of Trees, part 2
Trees in SQL -- Part III
#Maximilian: You are right, we forgot your actual requirement. How about a recursive stored procedure? I am not sure if this is possible in PostgreSQL, but here is a working SQL Server version:
CREATE PROCEDURE CloneNode
#to_clone_id int, #parent_id int
AS
SET NOCOUNT ON
DECLARE #new_node_id int, #child_id int
INSERT INTO test (name, parent_id)
SELECT name, #parent_id FROM test WHERE id = #to_clone_id
SET #new_node_id = ##IDENTITY
DECLARE #children_cursor CURSOR
SET #children_cursor = CURSOR FOR
SELECT id FROM test WHERE parent_id = #to_clone_id
OPEN #children_cursor
FETCH NEXT FROM #children_cursor INTO #child_id
WHILE ##FETCH_STATUS = 0
BEGIN
EXECUTE CloneNode #child_id, #new_node_id
FETCH NEXT FROM #children_cursor INTO #child_id
END
CLOSE #children_cursor
DEALLOCATE #children_cursor
Your example is accomplished by EXECUTE CloneNode 2, null (the second parameter is the new parent node).
This sounds like an exercise from "SQL For Smarties" by Joe Celko...
I don't have my copy handy, but I think it's a book that'll help you quite a bit if this is the kind of problems you need to solve.