Take a simple table like below:
Column Headings: || Agent's Name || Time Logged In || Center ||
Row 1: Andrew || 12:30 PM || Home Base
Row 2: Jeff || 7:00 AM || Virtual Base
Row 3: Ryan || 6:30 PM || Test Base
Now lets say that a single cell is deleted so the table now looks like this:
Column Headings: || Agent's Name || Time Logged In || Center ||
Row 1: Andrew || 12:30 PM ||
Row 2: Jeff || 7:00 AM || Virtual Base
Row 3: Ryan || 6:30 PM || Test Base
Notice that "Home Base" is missing. Now in excel you can delete the cell and shift the rest so the finished product looks like below:
Column Headings: || Agent's Name || Time Logged In || Center ||
Row 1: Andrew || 12:30 PM || Virtual Base
Row 2: Jeff || 7:00 AM || Test Base
Row 3: Ryan || 6:30 PM ||
And you can see we are left with a blank cell last row.
How do I code this procedure of shifting the cells up in SQL?
I've been struggling on this problem for weeks! Thank you!
You can't - SQL tables aren't Excel sheets - they just don't have that kind of a structure. No matter how hard you try - you won't be able to do something like that. It's just fundamentally different.
SQL Server tables have rows and columns - sure - but they have no implied order or anything. You cannot "shift" a row up - there's no "up" per se - it all depends on your ordering.
It's worse than comparing apples to oranges - it's like comparing apples to granite blocks - it's just not the same - don't waste your time trying to make it the same.
One of many options is to use an outer apply to fetch the Center from the next row:
declare #t table (name varchar(50), login time, center varchar(50))
insert into #t (name, login, center)
select 'Andrew', '12:30 PM', 'Home Base'
union all select 'Jeff', '7:00 AM', 'Virtual Base'
union all select 'Ryan', '6:30 PM', 'Test Base'
update t1
set t1.center = t3.center
from #t t1
outer apply (
select top 1 t2.center
from #t t2
where t2.name > t1.name
order by t2.name
) t3
select * from #t
You do have to specify an ordering (the example orders on name.)
If you have two sets and you simply want to assign one of each in the second set to items from the first set, "using them up", the simplest thing is to use ROW_NUMBER() on both sets and do a LEFT JOIN on the ROW_NUMBER() column from the first to the second set, assigning each row the next available item:
WITH set1 AS (
SELECT *, ROW_NUMBER() (OVER ORDER BY set1.sortorder /* choice of order by is obviously important */) AS ROWNUM
FROM set1
)
,set2 AS (
SELECT *, ROW_NUMBER() (OVER ORDER BY set2.sortorder /* choice of order by is obviously important */) AS ROWNUM
FROM set2
)
SELECT *
FROM set1
LEFT JOIN set2
ON set1.keys = set2.keys -- if there are any keys
AND set1.ROW_NUM = set2.ROW_NUM
Related
I am trying to summarize some massive tables in a way that can help me investigate further for some data issues. There are hundreds of 1000s of rows, and roughly 80+ columns of data in most of these tables.
From each table, I have already performed queries to throw out any columns that have all nulls or 1 value only. I've done this for 2 reasons.... for my purposes, a single value or nulls in the columns are not interesting to me and provide little information about the data; additionally, the next step is that I want to query each of the remaining columns and return up to 30 distinct values in each column (if the column has more than 30 distinct values, we show the 1st 30 distinct values).
Here is the general format of output I wish to create:
Column_name(total_num_distinct): distinct_val1(val1_count), distinct_val2(val2_couunt), ... distinct_val30(val30_count)
Assuming my data fields are integers, floats, and varchar2 data types, this is the SQL I was trying to use to generate that output:
declare
begin
for rw in (
select column_name colnm, num_distinct numd
from all_tab_columns
where
owner = 'scz'
and table_name like 'tbl1'
and num_distinct > 1
order by num_distinct desc
) loop
dbms_output.put(rw.colnm || '(' || numd || '): ');
for rw2 in (
select dis_val, cnt from (
select rw.colnm dis_val, count(*) cnt
from tbl1
group by rw.colnm
order by 2 desc
) where rownum <= 30
) loop
dbms_output.put(rw2.dis_val || '(' || rw2.cnt || '), ');
end loop;
dbms_output.put_line(' ');
end loop;
end;
I get the output I expect from the 1st loop, but the 2nd loop that is supposed to output examples of the unique values in each column, coupled with the frequency of their occurrence for the 30 values with the highest frequency of occurrence, appears to not be working as I intended. Instead of seeing unique values along with the number of times that value occurs in the field, I get the column names and count of total records in that table.
If the 1st loop suggests the first 4 columns in 'tbl1' with more than 1 distinct value are the following:
| colnm | numd |
|----------------|
| Col1 | 2 |
| Col3 | 4 |
| Col7 | 17 |
| Col12 | 30 |
... then the full output of 1st and 2nd loop together looks something like the following from my SQL:
Col1(2): Col1(tbl1_tot_rec_count), Col3(tbl1_tot_rec_count)
Col3(4): Col1(tbl1_tot_rec_count), Col3(tbl1_tot_rec_count), Col7(tbl1_tot_rec_count), Col12(tbl1_tot_rec_count)
Col7(17): Col1(tbl1_tot_rec_count), Col3(tbl1_tot_rec_count), Col7(tbl1_tot_rec_count), Col12(tbl1_tot_rec_count), .... , ColX(tbl1_tot_rec_count)
Col12(30): Col1(tbl1_tot_rec_count), Col3(tbl1_tot_rec_count), Col7(tbl1_tot_rec_count), Col12(tbl1_tot_rec_count), .... , ColX(tbl1_tot_rec_count)
The output looks cleaner when real data is output, each table outputting somewhere between 20-50 lines of output (i.e. columns with more than 2 values), and listing 30 unique values for each field (with their counts) only requires a little bit of scrolling, but isn't impractical. Just to give you an idea with fake values, the output would look more like this with real data if it was working correctly (but fake in my example):
Col1(2): DisVal1(874,283), DisVal2(34,578),
Col3(4): DisVal1(534,223), DisVal2(74,283), DisVal3(13,923), null(2348)
Col7(17): DisVal1(54,223), DisVal2(14,633), DisVal3(13,083), DisVal4(12,534), DisVal5(9,876), DisVal6(8,765), DisVal7(7654), DisVal8(6543), DisVal9(5432), ...., ...., ...., ...., ...., ...., ...., DisVal17(431)
I am not an Oracle or SQL guru, so I might not be approaching this problem in the easiest, most efficient way. While I do appreciate any better ways to approach this problem, I also want to learn why the SQL code above is not giving me the output I expect. My goal is trying to quickly run a single query that can tell me which columns have interesting data I might what to examine further in that table. I have probably 20 tables I need to examine that are all of similar dimensions and so very difficult to examine comprehensively. Being able to reduce these tables in this way to know what possible combinations of values may exist across the various fields in each of these tables would be very helpful in further queries to deep dive into the intricacies of the data.
It's because the select rw.colnm dis_val, count(*) cnt from tbl1 group by rw.colnm order by 2 desc is not doing at all what you think, and what you think to be done can't be done without dynamic SQL. What it does is in fact select 'a_column_of_tabl1' dis_val, count(*) cnt from tbl1 group by 'a_column_of_tabl1' order by 2 desc and what you need to do is execute dynamically the SQL: 'select ' || rw.colnm || ' dis_val, count(*) cnt from tbl1 group by ' || rw.colnm || ' order by 2 desc'.
Here is the (beta) query you can use to get the SQL to execute:
(I tested with user_* VIEWs instead of ALL_* to avoid getting to many results here...)
select utc.table_name, utc.column_name,
REPLACE( REPLACE(q'~select col_name, col_value, col_counter
from (
SELECT col_name, col_value, col_counter,
ROW_NUMBER() OVER(ORDER BY col_counter DESC) AS rn
FROM (
select '{col_name}' AS col_name,
{col_name} AS col_value, COUNT(*) as col_counter
from {table_name}
group by {col_name}
)
)
WHERE rn <= 30~', '{col_name}', utc.column_name), '{table_name}', utc.table_name) AS sql
from user_tab_columns utc
where not exists(
select 1
from user_indexes ui
join user_ind_columns uic on uic.table_name = ui.table_name
and uic.index_name = ui.index_name
where
ui.table_name = utc.table_name
and exists (
select 1 from user_ind_columns t where t.table_name = ui.table_name
and t.index_name = uic.index_name and t.column_name = utc.column_name
)
group by ui.table_name, ui.index_name having( count(*) = 1 )
)
and not exists(
SELECT 1
FROM user_constraints uc
JOIN user_cons_columns ucc ON ucc.constraint_name = uc.constraint_name
WHERE constraint_type IN (
'P', -- primary key
'U' -- unique
)
AND uc.table_name = utc.table_name AND ucc.column_name = utc.column_name
)
;
I have column Conveyor with conveyor name entries in the table Report_Line which I want to replace with conveyor no.
Belt - 1 | Slack - 2 | Chain - 3
It's real time scenario, as soon as a row is added with data, according to the name of conveyor, it should get replaced with it's respective number.
I tried replace query with Union statement but didn't work, throws error
SELECT TOP 1 *
FROM Report_Line
ORDER BY Serial_no DESC
SELECT REPLACE(Conveyor, 'Slack', '2')
UNION
SELECT REPLACE(Conveyor, 'Belt', '1')
UNION
SELECT REPLACE(Conveyor, 'Chain', '3')
GO
You could try using an inline, hard-coded table, and join to it by the Conveyor names.
Something like this:
Select Top 1* FROM Report_Line
Left Join
(values ('Slack', '2'),('Belt', '1'),('Chain', '3')) sidetable(conveyor_name,id)
on sidetable.conveyor_name = Report_Line.Conveyor
Order by Serial_no DESC
GO
I use SqlPlus.
There is a lot of solutions and examples out there for related problems, but I haven't been able to fix my issue.
Expected result: 1 line that gives information about a library member that borrowed a book for the longest time. (displaying the amount of time: ex. Johnson John has ...: 31 days)
My current query:
SELECT DISTINCT m.firstname || ' ' || m.lastname || ' has borrowed for the longest time: ' || ROUND(MAX(l.date_of_return - l.date_borrowed)) || ' days' "Longest time borrowed"
FROM loans l
JOIN members m
ON l.memberid = m.memberid
WHERE l.date_of_return - l.date_borrowed = (SELECT MAX(date_of_return - date_borrowed) FROM loans)
/
Tables used:
LOANS:
Name Null? Type
----------------------------------------------------- -------- ------------------------------------
ISBN NOT NULL VARCHAR2(20)
SERIAL_NO NOT NULL NUMBER(2)
DATE_BORROWED NOT NULL DATE
DATE_OF_RETURN DATE
MEMBERID NOT NULL VARCHAR2(11)
EXTEND VARCHAR2(5)
MEMBERS:
Name Null? Type
----------------------------------------------------- -------- ------------------------------------
MEMBERID NOT NULL VARCHAR2(11)
LASTNAME NOT NULL VARCHAR2(20)
FIRSTNAME VARCHAR2(20)
Error:
ERROR at line 1: ORA-00937: not a single-group group function
I think I'm overlooking a simple solution. Thanks in advance.
Try
SELECT m.firstname || ' ' || m.lastname || ' has borrowed for the longest time: ' || ROUND(l.date_of_return - l.date_borrowed) || ' days' "Longest time borrowed"
FROM loans l
JOIN members m
ON l.memberid = m.memberid
WHERE l.date_of_return - l.date_borrowed = (SELECT MAX(l2.date_of_return - l2.date_borrowed) FROM loans l2)
AND ROWNUM <=1
You might want to avoid the additional calculation based comparison outside of the inner select (which would likely end up as an additional loop over loans during execution significantly stretching the run time).
It should be possible to collect the id inside the inner select as well as the calculation result and use it outside.
Try something like this:
SELECT m.firstname, m.lastname, b.maxtime
FROM members m, loans l
INNER JOIN (
SELECT li.memberid id, MAX(li.date_of_return - li.date_borrowed) maxtime
FROM loans li
GROUP BY li.memberid
) b ON m.memberid = b.id
ORDER BY b.maxtime
BTW: There's a pretty good post covering similar topics here (just in case you haven't found this one while searching), which might contain some interesting ideas for what you're trying to do: SQL select only rows with max value on a column
I am looking into the Oracle SQL Model clause. I am trying to write dynamic Oracle SQL which can be adapted to run for a varying number of columns each time, using this model clause. However I am struggling to see how I could adapt this (even using PL/SQL) to a dynamic/generic query or procedure
here is a rough view of the table I am working on
OWNER||ACCOUNT_YEAR||ACCOUNT_NAME||PERIOD_1||PERIOD_2||PERIOD_3||PERIOD_4||PERIOD_5||PERIOD_6||....
---------------------------------------------------------------------------------------------------
9640|| 2018 ||something 1|| 34 || 444 || 982 || 55 || 42 || 65 ||
9640|| 2018 ||something 2|| 333 || 65 || 666 || 78 || 44 || 55 ||
9640|| 2018 ||something 3|| 6565 || 783 || 32 || 12 || 46 || 667 ||
Here is what I have so far:
select OWNER, PERIOD_1, PERIOD_2, PERIOD_3, PERIOD_4, PERIOD_5, PERIOD_6, PERIOD_7, PERIOD_8, PERIOD_9, PERIOD_10, PERIOD_11, PERIOD_12, ACCOUNT_YEAR, ACCOUNT_NAME
from DATA-TABLE
where OWNER IN ('9640') and PERIOD_1 is not null
MODEL ignore nav
Return UPDATED ROWS
PARTITION BY (OWNER, ACCOUNT_NAME)
DIMENSION BY (ACCOUNT_YEAR)
MEASURES (PERIOD_1,PERIOD_2, PERIOD_3, PERIOD_4, PERIOD_5, PERIOD_6, PERIOD_7, PERIOD_8, PERIOD_9, PERIOD_10, PERIOD_11, PERIOD_12)
RULES
(
PERIOD_1[2021] = PERIOD_1[2018] * 1.05,
PERIOD_2[2021] = PERIOD_2[2018] * 1.05,
PERIOD_3[2021] = PERIOD_3[2018] * 1.05,
PERIOD_4[2021] = PERIOD_4[2018] * 1.05,
PERIOD_5[2021] = PERIOD_6[2018] * 1.05,
PERIOD_7[2021] = PERIOD_7[2018] * 1.05,
PERIOD_8[2021] = PERIOD_8[2018] * 1.05,
PERIOD_9[2021] = PERIOD_9[2018] * 1.05,
PERIOD_10[2021] = PERIOD_10[2018] * 1.05,
PERIOD_11[2021] = PERIOD_11[2018] * 1.05,
PERIOD_12[2021] = PERIOD_12[2018] * 1.05
)
ORDER BY ACCOUNT_YEAR asc;
As you can see in the measures and rules section, I am currently hardcoding each period column into this query
I want to be able to use this model clause (well specifically the rule part in a flexible way, so I can have a query which could be run for say, just period 1 -3, or period 5-12...
I have tried looking into this but all examples show the left hand side of the rule (e.g. PERIOD_12[2021] =...) to explicitly refer to a column in a table, rather than a parameter or variable I can swap in for something else simply
Any help on how I might accomplish this through SQL or PLSQL would be greatly appreciated
First, you should try to avoid dynamic columns by changing the table structure to a simpler format. SQL is much simpler if you store the data vertically instead of horizontally - use multiple rows instead of multiple columns.
If you can't change the data structure, you still want to keep the MODEL query as simple as possible, because the MODEL clause is a real pain to work with. Transform the table from columns to rows using UNPIVOT, run a simplified MODEL query, and then transform the results back if necessary.
If you really, really need dynamic columns in a pure SQL statement, you'll either need to use an advanced data type like Gary Myers suggested, or use the Method4 solution below.
Sample Schema
To make the examples fully reproducible, here's the sample data I used, along with the MODEL query (which I had to slightly modify to only reference 6 variables and the new table name).
create table data_table
(
owner number,
account_year number,
account_name varchar2(100),
period_1 number,
period_2 number,
period_3 number,
period_4 number,
period_5 number,
period_6 number
);
insert into data_table
select 9640, 2018 ,'something 1', 34 , 444 , 982 , 55 , 42 , 65 from dual union all
select 9640, 2018 ,'something 2', 333 , 65 , 666 , 78 , 44 , 55 from dual union all
select 9640, 2018 ,'something 3', 6565 , 783 , 32 , 12 , 46 , 667 from dual;
commit;
MODEL query:
select OWNER, PERIOD_1, PERIOD_2, PERIOD_3, PERIOD_4, PERIOD_5, PERIOD_6, ACCOUNT_YEAR, ACCOUNT_NAME
from DATA_TABLE
where OWNER IN ('9640') and PERIOD_1 is not null
MODEL ignore nav
Return UPDATED ROWS
PARTITION BY (OWNER, ACCOUNT_NAME)
DIMENSION BY (ACCOUNT_YEAR)
MEASURES (PERIOD_1,PERIOD_2, PERIOD_3, PERIOD_4, PERIOD_5, PERIOD_6)
RULES
(
PERIOD_1[2021] = PERIOD_1[2018] * 1.05,
PERIOD_2[2021] = PERIOD_2[2018] * 1.05,
PERIOD_3[2021] = PERIOD_3[2018] * 1.05,
PERIOD_4[2021] = PERIOD_4[2018] * 1.05,
PERIOD_5[2021] = PERIOD_5[2018] * 1.05,
PERIOD_6[2021] = PERIOD_6[2018] * 1.05
)
ORDER BY ACCOUNT_YEAR, ACCOUNT_NAME asc;
Results:
OWNER PERIOD_1 PERIOD_2 PERIOD_3 PERIOD_4 PERIOD_5 PERIOD_6 ACCOUNT_YEAR ACCOUNT_NAME
----- -------- -------- -------- -------- -------- -------- ------------ ------------
9640 35.7 466.2 1031.1 57.75 44.1 68.25 2021 something 1
9640 349.65 68.25 699.3 81.9 46.2 57.75 2021 something 2
9640 6893.25 822.15 33.6 12.6 48.3 700.35 2021 something 3
UNPIVOT approach
This example uses static code to demonstrate the syntax, but this can also be made more dynamic if necessary, perhaps through PL/SQL that creates temporary tables.
create table unpivoted_data as
select *
from data_table
unpivot (quantity for period_code in (period_1 as 'P1', period_2 as 'P2', period_3 as 'P3', period_4 as 'P4', period_5 as 'P5', period_6 as 'P6'));
With unpivoted data, the MODEL clause because simpler. Instead of listing a rule for each period, simply partition by the PERIOD_CODE:
select *
from unpivoted_data
where OWNER IN ('9640')
and (OWNER, ACCOUNT_YEAR, ACCOUNT_NAME) in
(
select owner, account_year, account_name
from unpivoted_data
where period_code = 'P1'
and quantity is not null
)
MODEL ignore nav
Return UPDATED ROWS
PARTITION BY (OWNER, ACCOUNT_NAME, PERIOD_CODE)
DIMENSION BY (ACCOUNT_YEAR)
MEASURES (QUANTITY)
RULES
(
QUANTITY[2021] = QUANTITY[2018] * 1.05
)
ORDER BY ACCOUNT_YEAR, ACCOUNT_NAME, PERIOD_CODE;
Results:
OWNER ACCOUNT_YEAR ACCOUNT_NAME PERIOD_CODE QUANTITY
----- ------------ ------------ ----------- --------
9640 2018 something 1 P1 34
9640 2018 something 1 P2 444
9640 2018 something 1 P3 982
...
Dynamic SQL in SQL
If you really need to do this all in one query, my open source package Method4 can help. Once the package is
installed, you call it by passing in a query that will generate the query you want to run.
This query returns the same results as the previous MODEL query, but will automatically adjust based on the columns in the table.
select * from table(method4.dynamic_query(
q'[
--Generate the MODEL query.
select
replace(replace(q'<
select OWNER, #PERIOD_COLUMN_LIST#, ACCOUNT_YEAR, ACCOUNT_NAME
from DATA_TABLE
where OWNER IN ('9640') and PERIOD_1 is not null
MODEL ignore nav
Return UPDATED ROWS
PARTITION BY (OWNER, ACCOUNT_NAME)
DIMENSION BY (ACCOUNT_YEAR)
MEASURES (#PERIOD_COLUMN_LIST#)
RULES
(
#RULES#
)
ORDER BY ACCOUNT_YEAR, ACCOUNT_NAME asc
>', '#PERIOD_COLUMN_LIST#', period_column_list)
, '#RULES#', rules) sql_statement
from
(
--List of columns.
select
listagg(column_name, ', ') within group (order by column_id) period_column_list,
listagg(column_name||'[2021] = '||column_name||'[2018] * 1.05', ','||chr(10)) within group (order by column_id) rules
from user_tab_columns
where table_name = 'DATA_TABLE'
and column_name like 'PERIOD%'
)
]'
));
Don't.
You can get an idea of the underlying obstruction if you understand the PARSE, BIND, EXECUTE flow of SQL as demonstrated by the DBMS_SQL package
https://docs.oracle.com/en/database/oracle/oracle-database/19/arpls/DBMS_SQL.html#GUID-BF7B8D70-6A09-4E04-A216-F8952C347BAF
A cursor is opened and an SQL statement is parsed once. After being parsed, a DESCRIBE_COLUMNS can be called which tells you definitively what the columns will be returned by the execution of that SQL statement. From that point you can do multiple BIND and EXECUTE executions, putting different values for variables into the same statement and re-running. Each EXECUTE may be followed up by one of more FETCHes. None of the bind, execute or fetch can affect what columns are returned (either in number of columns, name, order or datatype).
The only way to change the columns returned is to parse a different SQL statement.
Depending on what you want at the end, you might be able to use a complex datatype (such as XML or JSON) to return data with different internal structures from the same statement (or even in different rows returned by the same statement),
I have a column that has several items in which I need to count the times it is called, my column table looks something like this:
Table Example
Id_TR Triggered
-------------- ------------------
A1_6547 R1:23;R2:0;R4:9000
A2_1235 R2:0;R2:100;R3:-100
A3_5436 R1:23;R2:100;R4:9000
A4_1245 R2:0;R5:150
And I would like the result to be like this:
Expected Results
Triggered Count(1)
--------------- --------
R1:23 2
R2:0 3
R2:100 2
R3:-100 1
R4:9000 2
R5:150 1
I've tried to do some substring, but cant seem to find how to solve this problem. Can anyone help?
This solution is X3 times faster than the CONNECT BY solution
performance: 15K records per second
with cte (token,suffix)
as
(
select substr(triggered||';',1,instr(triggered,';')-1) as token
,substr(triggered||';',instr(triggered,';')+1) as suffix
from t
union all
select substr(suffix,1,instr(suffix,';')-1) as token
,substr(suffix,instr(suffix,';')+1) as suffix
from cte
where suffix is not null
)
select token,count(*)
from cte
group by token
;
with x as (
select listagg(Triggered, ';') within group (order by Id_TR) str from table
)
select regexp_substr(str,'[^;]+',1,level) element, count(*)
from x
connect by level <= length(regexp_replace(str,'[^;]+')) + 1
group by regexp_substr(str,'[^;]+',1,level);
First concatenate all values of triggered into one list using listagg then parse it and do group by.
Another methods of parsing list you can find here or here
This is a fair solution.
performance: 5K records per second
select triggered
,count(*) as cnt
from (select id_tr
,regexp_substr(triggered,'[^;]+',1,level) as triggered
from t
connect by id_tr = prior id_tr
and level <= regexp_count(triggered,';')+1
and prior sys_guid() is not null
) t
group by triggered
;
This is just for learning purposes.
Check my other solutions.
performance: 1K records per second
select x.triggered
,count(*)
from t
,xmltable
(
'/r/x'
passing xmltype('<r><x>' || replace(triggered,';', '</x><x>') || '</x></r>')
columns triggered varchar(100) path '.'
) x
group by x.triggered
;