Create temp table in Azure Databricks and insert lots of rows - sql

Here's the end result of what I'm trying to do, because I think that I'm making it needlessly complicated.
I want to query data where UPC_ID IN (VERY LONG LIST OF UPCS). Like, 20k lines.
I thought that perhaps the easiest way to do this would be to create a temporary table, and insert the lines 1000 at a time (and then use that table for the WHERE condition.)
When I try to run
CREATE TABLE #TEMP_ITEM (UPC_ID BIGINT NOT NULL)
I get
[PARSE_SYNTAX_ERROR] Syntax error at or near '#'line 1, pos 13
The list of UPCs comes from a spreadsheet, and there's no shared attributes where I can just SELECT INTO or generate the list using anything that already exists in the database.
I know that I'm missing something painfully stupid here, but I am stuck. Help?

You are almost there...
I hope this is in Databricks....
Basically we cannot insert directly, but you can simulate it with the output of an insert statement...
create
or replace temporary view TEMP_ITEM as
select
1 as UPC_ID
UNION
select
2 as UPC_ID
...
...
Please refer to the below link for further details...
Is it possible to insert into temporary table in spark?
Hope this helps...

Related

Add column with substring of other column in SQL (Snowflake)

I feel like this should be simple but I'm relatively unskilled in SQL and I can't seem to figure it out. I'm used to wrangling data in python (pandas) or Spark (usually pyspark) and this would be a one-liner in either of those. Specifically, I'm using Snowflake SQL, but I think this is probably relevant to a lot of flavors of SQL.
Essentially I just want to trim the first character off of a specific column. More generally, what I'm trying to do is replace a column with a substring of the same column. I would even settle for creating a new column that's a substring of an existing column. I can't figure out how to do any of these things.
On obvious solution would be to create a temporary table with something like
CREATE TEMPORARY TABLE tmp_sub AS
SELECT id_col, substr(id_col, 2, 10) AS id_col_sub FROM table1
and then join it back and write a new table
CREATE TABLE table2 AS
SELECT
b.id_col_sub as id_col,
a.some_col1, a.some_col2, ...
FROM table1 a
JOIN tmp_sub b
ON a.id_col = b.id_col
My tables have roughly a billion rows though and this feels extremely inefficient. Maybe I'm wrong? Maybe this is just the right way to do it? I guess I could replace the CREATE TABLE table2 AS... to INSERT OVERWRITE INTO table1 ... and at least that wouldn't store an extra copy of the whole thing.
Any thoughts and ideas are most welcome. I come at this humbly from the perspective of someone who is baffled by a language that so many people seem to have mastery over.
I'm not sure the exact syntax/functions in Snowflake but generally speaking there's a few different ways of achieving this.
I guess the general approach that would work universally is using the SUBSTRING function that's available in any database.
Assuming you have a table called Table1 with the following data:
+-------+-----------------------------------------+
Code | Desc
+-------+-----------------------------------------+
0001 | 1First Character Will be Removed
0002 | xCharacter to be Removed
+-------+-----------------------------------------+
The SQL code to remove the first character would be:
select SUBSTRING(Desc,2,len(desc)) from Table1
Please note that the "SUBSTRING" function may vary according to different databases. In Oracle for example the function is "SUBSTR". You just need to find the Snowflake correspondent.
Another approach that would work at least in SQLServer and MySQL would be using the "RIGHT" function
select RIGHT(Desc,len(Desc) - 1) from Table1
Based on your question I assume you actually want to update the actual data within the table. In that case you can use the same function above in an update statement.
update Table1 set Desc = SUBSTRING(Desc,2,len(desc))
You didn't try this?
UPDATE tableX
SET columnY = substr(columnY, 2, 10 ) ;
-Paul-
There is no need to specify the length, as is evidenced from the following simple test harness:
SELECT $1
,SUBSTR($1, 2)
,RIGHT($1, -2)
FROM VALUES
('abcde')
,('bcd')
,('cdef')
,('defghi')
,('e')
,('fg')
,('')
;
Both expressions here - SUBSTR(<col>, 2) and RIGHT(<col>, -2) - effectively remove the first character of the <col> column value.
As for the strategy of using UPDATE versus INSERT OVERWRITE, I do not believe that there will be any difference in performance or outcome, so I might opt for the UPDATE since it is simpler. So, in conclusion, I would use:
UPDATE tableX
SET columnY = SUBSTR(columnY, 2)
;

INSERT FROM EXISTING SELECT without amending

With GDPR in the UK on the looming horizon and already have a team of 15 users creating spurious SELECT statements (in excess of 2,000) across 15 differing databases I need to be able to create a method to capture an already created SELECT statement and be able to assign surrogate keys/data WITHOUT rewriting every procedure we already have.
There will be a need to run the original team members script as normal and there will be requirements to pseudo the values.
My current thinking is to create a stored procedure along the lines of:
CREATE PROC Pseudo (#query NVARCHAR(MAX))
INSERT INTO #TEMP FROM #query
Do something with the data via a mapping table of real and surrogate/pseudo data.
UPDATE #TEMP
SET FNAME = (SELECT Pseudo_FNAME FROM PseudoTable PT WHERE #TEMP.FNAME = PT.FNAME)
SELECT * FROM #TEMP
So that team members can run their normal SELECT statements and get pseudo data simply by using:
EXEC Pseudo (SELECT FNAME FROM CUSTOMERS)
The problem I'm having is you can't use:
INSERT INTO #TEMP FROM #query
So I tried via CTE:
WITH TEMP AS (#query)
..but I can't use that either.
Surely there's a way of capturing the recordset from an existing select that I can pull into a table to amend it or capture the SELECT statement; without having to amend the original script. Please bear in mind that each SELECT statement will be unique so I can't write COLUMN or VALUES etc.
Does any anyone have any ideas or a working example(s) on how to best tackle this?
There are other lengthy methods I could externally do to carry this out but I'm trying to resolve this within SQL if possible.
So after a bit of deliberation I resolved it.
I passed the Original SELECT SQL to SP that used some SQL Injection, which when executed INSERTed data. I then Updated from that dataset.
The end result was "EXEC Pseudo(' Orginal SQL ;')
I will have to set some basic rules around certain columns for now as a short term fix..but at least users can create NonPseudo and Pseudo data as required without masses of reworking :)

DB2 Stored Procedures- looping through values?

Okay, so I'm a novice at writing stored procedures. I'm trying to perform a function similar to a foreach() you would see in a programming language. Right now I have a temp table populated with the values I'd like to loop through. I would like to (for each value in this table) execute a SQL statement based upon that value. So, here's my pseudocode to illustrate what I'm really after here:
foreach(value in my temp table) {
SELECT * FROM TABLE WHERE column_x = value
}
No I know nothing of stored procedures so how can I get this done? Here's my script so far:
DROP TABLE SESSION.X;
CREATE GLOBAL TEMPORARY TABLE
SESSION.X (
TD_NAME CHAR(30)
);
INSERT INTO
SESSION.X
SELECT DISTINCT
TD_NAME
FROM
DBA.AFFIN_PROG_REPORT
WHERE
TD_NAME IS NOT NULL;
Any help is very much appreciated!
You need, by example, a cursor.
See the example: https://stackoverflow.com/a/4975012/3428749
See the documentation: https://msdn.microsoft.com/pt-br/library/ms180169(v=sql.120).aspx

Very simple SQL query on varchar fields with sqlite

I created a table with this schema using sqlite3:
CREATE TABLE monitored_files (file_id INTEGER PRIMARY KEY,file_name VARCHAR(32767),original_relative_dir_path VARCHAR(32767),backupped_relative_dir_path VARCHAR(32767),directory_id INTEGER);
now, I would like to get all the records where original_relative_dir_path is exactly equal to '.', without 's. What I did is this:
select * from monitored_files where original_relative_dir_path='.';
The result is no records even if in the table I have just this record:
1|'P9040479.JPG'|'.'|'.'|1
I read on the web and I see no mistakes in my syntax... I also tried using LIKE '.', but still no results. I'm not an expert of SQL so maybe you can see something wrong?
Thanks!
I see no problem with the statement.
I created the table that you described.
Did an INSERT with the same values that you provided.
And did the query, and also queried without a where clause.
No problems encountered, so I suspect that when you execute your selection, you may not be connected to the correct database.

how to convert result of an select sql query into a new table in ms access

how to convert result of an select sql query into a new table in msaccess ?
You can use sub queries
SELECT a,b,c INTO NewTable
FROM (SELECT a,b,c
FROM TheTable
WHERE a Is Null)
Like so:
SELECT *
INTO NewTable
FROM OldTable
First, create a table with the required keys, constraints, domain checking, references, etc. Then use an INSERT INTO..SELECT construct to populate it.
Do not be tempted by SELECT..INTO..FROM constructs. The resulting table will have no keys, therefore will not actually be a table at all. Better to start with a proper table then add the data e.g. it will be easier to trap bad data.
For an example of how things can go wrong with an SELECT..INTO clause: it can result in a column that includes the NULL value and while after the event you can change the column to NOT NULL the engine will not replace the NULLs, therefore you will end up with a NOT NULL column containing NULLs!
Also consider creating a 'viewed' table e.g. using CREATE VIEW SQL DDL rather than a base table.
If you want to do it through the user interface, you can also:
A) Create and test the select query. Save it.
B) Create a make table query. When asked what tables to show, select the query tab and your saved query.
C) Tell it the name of the table you want to create.
D) Go make coffee (depending on taste and size of table)
Select *
Into newtable
From somequery