I have this data stored in the SQL Server
How can I delete the first three rows by calling the row number found in the left (1,2,3)?
You can't. That row number is not tangible and is nothing more than the order the results were returned in. SQL does not guarantee order of data, so there is no rule that says if you run the same query 20 times, you'll get the same results at 1,2, and 3 each time. That's not to say you won't get them same results, they're just not guaranteed. You need to delete using a column that actually exists as part of the table definition, such as F1, F2, etc...
As others have suggested in the comments, try to clean up the data before you import it into SQL Sever. You have a few options.
Delete the rows from the file before importing.
Configure the Import Wizard correctly to exclude those rows.
Helpful link
https://learn.microsoft.com/en-us/sql/relational-databases/import-export/import-data-from-excel-to-sql?view=sql-server-ver15
Related
Within excel I have connected my Microsoft SQL Server database to it, to display results. The system I have set in place is built off of a form. If a user chooses option 1, the query results will show:
Select person, car, house from mytable1
If the user chooses option 2, the query result will show:
Select job, person, land, truck from mytable2
The very first select statement will give me a table in the column order in which I would like it. However, as a user uses the form again, it will re-query it to use which ever select statement is requested. When the re-query happens the column order which items are shown are in different areas. Even if the select statment is stated within the same order. Is there a way in which I could order the columns in a specific order?
I've attempted to unchecked "Preserve Column Sort" within the Data Range Properties, but ends up leaving empty columns. I.E. : Column1, Column2, Column3, etc.
You may already know this, but since Excel allows you to move the columns in a table / ListObject around, it seeks to preserve any changes you make. So, if you run a query:
select one, two, three
And then move the column "three" in front of "one," when you re-run your query, it will keep them in that order in the ListObject, even though the query said otherwise.
This also means if you add a column, no matter where you add it, it will go to the end when MS Query renders the output.
select four, one, two three
("four" goes to the end in Excel, even though it was listed first in SQL)
In your example, the column "person" was common across the two queries, so Excel (MS Query) would move it to the first position and put all other columns after that.
When Excel deletes the old columns, it leaves a tracer behind -- you may notice the columns that follow your table aren't the normal size; they are the size of the fields that were deleted. I call them "ghosts."
This is a serious hack, but the only way I know of to alleviate this problem is to run a bogus query (ie select 1), delete the ghosts -- remove the entire columns, and then run your second query. Here is some ugly code I use in VBA to do this:
Dim lo As ListObject
Set lo = Sheets("Sheet1").ListObjects("Table_ExternalData_1")
Range(lo.HeaderRowRange.Offset(0, lo.HeaderRowRange.Columns.Count), _
Range("XFD1")).EntireColumn.Delete
Yes, this deletes every column after the table, which means if there is useful data above or below the table in columns after the table, those are wiped out.
Maybe there is a better way -- I'm curious to see if you get any other solutions.
I'm still relatively new to SQL and Pentaho.
I've pulled a table with two different IDs and need to run a query for each specific instance.
For example,
SELECT *
FROM Table
WHERE RecordA = 'value in column A'
AND RecordB = 'value in column B'
I need the results back, either appended to new columns in the original table or part of their own text file output.
I was initially looking at using a formula for this inside of Pentaho, but couldn't quite figure it out. Since I have the query written I threw it into Excel and got the concatenated results (so a string of 350 or so queries that I need to run). I'm just not sure how to accomplish this - I tried the Execute SQL Script inside of Pentaho but it doesn't seem to do output?
Any direction would be useful. I've searched a little but have come up short so far, possibly because I am still pretty new to this platform.
You can accomplish this behavior in a lot of ways, with a "Database Lookup" step for example, but I usually do that in a quite easy way and here is a example for your tests, I hope it helps.
The idea here is to have two Table input steps, the first one will fetch the IDs we want to look at. For example you may use a SQL query similar to note on the left. The result will be a 1 column stream of rows.
Next we have a Table Input that reads the rows received and executes it's query for each row. I'll add a screenshot with the options that I selected.
What it does is replace a placeholder '?' with the data that is received. If you need two columns use two '?' but remember that it will replace the first one with the first column and the second one with the second column
And you are good to go. Test it a couple of times and good luck.
And the config for the second table input.
I am creating a transformation that take input from CSV file and output to a table. That is running correctly but the problem is if I run that transformation more then one time. Then the output table contain the duplicate rows again and again.
Now I want to remove all duplicate row from the output table.
And if I run the transformation repeatedly it should not affect the output table until it don't have a new row.
How I can solve this?
Two solutions come to my mind:
Use Insert / Update step instead of Table input step to store data into output table. It will try to search row in output table that matches incoming record stream row according to key fields (all fields / columns in you case) you define. It works like this:
If the row can't be found, it inserts the row. If it can be found and the fields to update are the same, nothing is done. If they are not all the same, the row in the table is updated.
Use following parameters:
The keys to look up the values: tableField1 = streamField1; tableField2 = streamField2; tableField3 = streamField3; and so on..
Update fields: tableField1, streamField1, N; tableField2, streamField2, N; tableField3, streamField3, N; and so on..
After storing duplicite values to the output table, you can remove duplicites using this concept:
Use Execute SQL step where you define SQL which removes duplicite entries and keeps only unique rows. You can inspire here to create such a SQL: How can I remove duplicate rows?
Another way is to use the Merge rows (diff) step, followed by a Synchronize after merge step.
As long as the number of rows in your CSV that are different from your target table are below 20 - 25% of the total, this is usually the most performance friendly option.
Merge rows (diff) takes two input streams that must be sorted on its key fields (by a compatible collation), and generates the union of the two inputs with each row marked as "new", "changed", "deleted", or "identical". This means you'll have to put Sort rows steps on the CSV input and possibly the input from the target table if you can't use an ORDER BY clause. Mark the CSV input as the "Compare" row origin and the target table as the "Reference".
The Synchronize after merge step then applies the changes marked in the rows to the target table. Note that Synchronize after merge is the only step in PDI (I believe) that requires input be entered in the Advanced tab. There you set the flag field and the values that identify the row operation. After applying the changes the target table will contain the exact same data as the input CSV.
Note also that you can use a Switch/Case or Filter Rows step to do things like remove deletes or updates if you want. I often flow off the "identical" rows and write the rest to a text file so I can examine only the changes.
I looked for visual answers, but the answers were text, so adding this visual-answer for any kettle-newbie like me
Case
user-updateslog.csv (has dup values) ---> users_table , store only latest user detail.
Solution
Step 1: Connect csv to insert/update as in the below Transformation.
Step 2: In Insert/Update, add condition to compare keys to find the candidate row, and choose "Y" fields to update.
I need to set up a new company for automated data import. The utility has provided the data in a spreadsheet. (Image 1)
Based on this data, I need to create a stored procedure that will identify the correct meter, if it exists, and perform either an insert or update to the monthly data table. For automated utility data import, I want to make sure I restrict everything to a particular utility company.
The steps are the following ( I am having a hard time converting this to SQL)
1- I just want a script that identify the correct meter to see if it exists, basically check the Meter# column in the excel with the MeterNumber column in the Meters table.
2- The next step is perform either an insert or update to the MonthlyData table. This is a screen shot of all its columns.
3- Then I just want to make sure that I am restricting everything to the particular company which in this case Site1 since 2 different companies might have the same meter#. The UtilityCompany table contains 3 columns: ID, Name, UtilityType
I honestly do not know from where to get started, would anybody help me with the script? Thank you
You will want to:
perform a Bulk Insert operation to take your data from the excel file into a staging table.
write a query to select ALL rows for the corresponding utility company (notice I didn't see iterate over each row...). This select could be an update where you update an additional column to mark the row as an INSERT, or an UPDATE.
Then the last step (2 parts), retrieve all of the rows that were marked as INSERT, and insert those into your table. Then grab all rows that were marked with an UPDATE, and update their corresponding values based on your matching criteria.
If I want to use a sql query to import around 2000 rows from the original google spreadsheet to another one. I will first have to manually keep adding/increasing the number of rows first in the sheet before I attempt to import the 2000 rows.
Example sql query,
=QUERY('Experts'!A:Z,"Select A,C,M where M <=date """&text(H3,"yyyy-mm-dd")&""" and L='Yes' ")
Is there anyway for me to be able to use this query directly without first having to manually increase the number of rows to accommodate the imported data?
You do not need to manually add rows, the query function will do it for you.