Loading data to a new table with row prefixes to reflect original source in SQL server - sql

I am trying to clean and reconcile some datasets.
I have my original tables in raw format (all strings) and I am copying the information from the source, to another table I have created with the right data types.
However, I would like the new data in the new table to reflect the original source table in the form of (i.e.) 1_blablabla. So that, in the new table, I would have:
1_blablabla
2_blablabla
3_blablabla
and so on...
So far I am creating the new table as follows:
CREATE TABLE *table_name*(
*column1_name* DATATYPE(XX),
*column2_name* DATATYPE(XX),
*column3_name* DATATYPE(XX),
*column4_name* DATATYPE(XX)
);
And then inserting the data from the old table with the raw data, to the new table with the "correct" format, like this:
INSERT INTO *new_table* (*column1_name*, *column2_name*, *column3_name*, *column4_name*)
SELECT (*sourcecolumn1*, *sourcecolumn2*, *sourcecolumn3*, *sourcecolumn4*)
FROM sourcetable1;
However, this loads the data as like for like, exactly as it is in the source table,
and instead, I would like it to reflect, at least in the rows of one of the columns, the prefix "1_" for example, so that it is intuitive to see where it came from.

Related

MS Access - Enter Parameter Value on INSERT INTO

I have a database, which contains information that I can't share images of due to compliance reasons.
I have a table I need to copy data from, so I was using the following SQL:
INSERT INTO completedtrainingstestfinal (MALicenseNum)
SELECT MALicenseNum
FROM CompletedTrainings
WHERE (CompletedTrainings.MALicenseNum IS NOT NULL)
AND (CompletedTrainings.Employee = completedtrainingstestfinal.Employee);
It keeps popping up the Enter Parameter Value, centered on the new table (named completedtrainingstestfinal) at the Employee column.
Background: The original table is a mess, and this is to be the replacement table, I've had to pivot the table in order to clean it up, and am now trying to remove an ungodly amount of nulls. The goal is to clean up the query process for the end users of this who need to put in training and certification/recertification through the forms.
When you look in the old table, it has been designed to reference another table and display the actual names, but as seen in the image below it is storing the data as the integer number Employee.
The new table Employee column was a direct copy but only displays the integer, my instincts tell me that the problem is here, but I have been unable to find a solution. Anyone have any suggestions to throw me in the right direction?
Edited to add: It might be an issue where the tables have different numbers of rows?
This is the design view of the two relevant tables :
Table 1
Table 2

Convert the latest data pull of a raw Variant table into a normal table: Snowflake

I have a variant table where raw json data is stored in a column called "raw" as shown here.
Each row of this table is a full data pull from an API and ingested via snowpipe. Within the json there is a 'pxQueryTimestamp' key and value pair. The latest value for this field should have the most up to date data. How would I go about only normalizing this row?
Usually my way around this, is to only pipe over the latest data from "s3" so that this table has only one row, then I normalize that.
I'd like to have a historic table of all data pulls as show below but when normalizing we only care about the most relevant up to date data.
Any help is appreciated!
If you are saying that you want to flatten and retain everything in the most current variant record, then I'd suggest leveraging a STREAM object in Snowflake, which would then only have the latest variant record. You could then TRUNCATE your flattened table and run an insert from the STREAM object to your flattened table, which would then move the offset forward and your STREAM would then be empty.
Take a look at the documentation here:
https://docs.snowflake.net/manuals/user-guide/streams.html

Kettle Pentaho backup transformation by latest data

I need to sychronize some data from a database to another using kettle/spoon transformation. The logic is i need to select latest date data that has existed in destination db. Then select from source db from the last date. What transformation element do i need to do this?
Thank you.
There can be many solutions:
If you have timestamp columns in both the source and destination tables, then you can take two table input steps. In the first one, just select the max last updated timestamp, use it as a variable in the next table input, taking it as a filter for the source data. You can do something like this:
If you just want the new data to be updated in the destination table and you don't care much about timestamp, I would suggest you to use insert/update step for output. It will bring all the data to the stream and if it finds a match, it won't insert anything. If it doesn't find a match, it will insert the new row. If it finds any modifications to the existing row in the destination table, it will update it accordingly.

How to add a new BigQuery table data to Tableau extract?

Is it possible to add data from a new BigQuery table to existing Tableau extract?
For example, there are BigQuery tables partitioned by date like access_20160101, access_20160102, ... and data from 2016/01/01 to 2016/01/24 is already in Tableau server extract. Now a new table for 2016/01/25, access_20160125 has been created and I want to add the data to the existing extract, but don't want to read the old tables because there is no change in them but loading them will be charged by Google.
If I understand correctly: you created an extract for a table in BigQuery and now you want to append data in a different table to that extract.
As long as both tables have exactly the same column names and data types in those columns you can do this:
Create an extract from the new table.
Append that extract to the old one. (see: add data from a file)
Now you have one extract with the data of both tables.

SSIS Check Excel source rows redirect rows to another table on 'x' number of field matches

I work in a sales based environment and our data consists of 'leads'.
Let's say we record CompanyName, PhoneNumber, Address1 & PostCode(ZIP). These rows a seeded with a unique ID in the schema.
The leads come in from various sources and are compiled onto a spread sheet and then imported into SQL 2012 using SSIS.
After a validation check to see if a file exists we then use a simple data flow which consists of an Excel source, Derived Column, Data Conversion and finally an OLE DB Destination.
My requirement I'm sure has a relatively simple solution. I understand what I need to achieve is the first step. I need to take a sample of data from the last rolling two months, if 2 or more fields in the source excel file match the corresponding field in the destination sql table then I want to redirect to another table.
I am unsure of which combination of components I could use to achieve this. I believe that Fuzzy lookup may not be what I am looking for as I am looking to find exact field matches, I have looked at the lookup component but I am unsure if this is the way to go.
Could anyone please provide some advice on how I can best achieve this as simply as possible.
You can use the Lookup to check for matches in your existing table. However, it will be fairly complicated to implement the requirement of checking for any two or more fields matching. Your expression would be long and complex basically consisting of:
(using pseudo code for readability)
IIF((a=a AND b=b) OR (a=a AND c=c) OR (b=b AND c=c) OR ...and so on
for every combination of two columns you want to test
I would do this by importing the entire spreadsheet to a staging table, and doing the existing rows check in a SQL stored proc that moves the data to the desired destination table.