SQL Server - multiple tables columns in 1 view and under 1 column header - sql

Is it possible to do the following:
I have 2 tables called Holidays and Allocations, both of which contain a startDate and endDate field. I want to create a view which will display the startDate and endDate fields from both of these tables, but under the same column headers if possible, can this be done? or do I need to create a single table to handle this?
My theory behind using a view is that this will avoid the 1 large table storing a lot more columns, of which will contain null's where certain fields are not required.

Yes, you can do it in view by using UNION
CREATE VIEW [dbo].[ViewHolidayAllocation]
AS
SELECT
ROW_NUMBER() OVER(ORDER BY Id) AS RowNum,
*
FROM
(
SELECT Id, startDate, endDate FROM Holidays
UNION
SELECT Id, startDate, endDate FROM Allocations
) AS result

You can't have column name duplicate in view. You have to normalize db if it has sense or you have to define alias to second field.

Related

In Snowflake, I want to count duplicates in a table based on all the columns in the table without typing out every column name

I have a table with 60 columns in it. I would like to identify how many duplicates there are in the table based on all the columns being identical.
I don't want to have to type out every field name in the SELECT or GROUP BY clauses. Is there a way to do that?
You can use an approach like this for each table:
SELECT
MD5(OBJECT_CONSTRUCT(SRC.*)::VARCHAR) DUP_MD5, SUM(1) AS TOTAL_COUNT
FROM <table> SRC
GROUP BY 1
HAVING SUM(1) > 1;

Create a BigQuery view to get the latest rows from a partitioned (and clustered) table

The issue
I'm trying to create a view to get the latest rows from a partitioned table, filtered on the date partition _LOCALDATETIME and zero or more cluster fields. I can create a view which uses a partition and I can create a view which handles some filters, but I can't work out the syntax to achieve both.
An example query requirement
SELECT fieldA, fieldB, fieldC FROM theView
WHERE date between '2021-01-01' and '2021-12-31' AND
_CLUSTERFIELD1 = 'foo'
GROUPBY _CLUSTERFIELD2
ORDERBY _CLUSTERFIELD3
Table schema
_LOCALDATETIME
_id
_CLUSTERFIELD1
_CLUSTERFIELD2
_CLUSTERFIELD3
_CLUSTERFIELD4
...other fields
Base on what I'm understanding from your case I have come with this approach.
I have created partion table based on _LOCALDATETIME with clustered fields and then the view that returns the data from a defined date scope and the value of the last elements based on _id. So, that will allow me to have a view which have the last items of a partitioned table from a fixed date range.
view
CREATE VIEW `<my-project-id>.<dataset>.<table>` AS
with range_id as (
select MAX(_id) as last_id_partition,_localdatetime as partition_ FROM
`<my-project-id>.<dataset>.<table>` where _localdatetime BETWEEN "2020-01-01" and "2022-01-01" group by _localdatetime)
SELECT s.*
FROM
`<my-project-id>.<dataset>.<table>` s
inner join range_id r on s._id = r.last_id_partition and s._localdatetime = r.partition_
where _localdatetime BETWEEN "2020-01-01" and "2022-01-01"
group by _id,_localdatetime,_name,_location
The view will return the last ids of a partioned clustered table with the clustered fields that are within the view (which is for year 2020 and 2021).
query
select * from `<my-project-id>.<dataset>.<table>`
WHERE _localdatetime between '2021-12-21' and '2021-12-22'
and <clusteredfield> = 'Venezuela'
It will return the records available for that filter as the data its already defined in the view.
What you can't do is to have a view without the partition field as it must exist to query a partitioned table. You can also update use the queries inside a function to further customize your outputs.

SQL Query with group by for multiple date ranges

I need to formulate a t-sql query and so far I have been unable to do so. The table that I need to query is called Operations with two columns ,an FK OperationTypeID and an OperationDate. The query needs to return a result which comprises of the count of operation type id during the range specified.
Through the application interface the user can specify multiple operationtype Ids as well as their individual date ranges so for instance, the operationtype id 'A' can be looked for in the range
22/04/2010 to 22/04/2012 and operationtype Id 'B' can be searched in 15/10/2012 to 15/11/2013 and so on for other operation type ids. Now I need to return a count for each operationtype id during each of the range specified for individual operation type Ids.
What is the most efficient way to achieve this in a single t-sql query considering the performance issues ... a rough layout presented below, i am not very good at formatting so i hope it will still give an idea.
+---------------+----------+----------+-----+
|OperationTypeID|Min date |Max Date |Count|
+---------------+----------+----------+-----+
|A |22/04/2010|22/04/2012|899 |
+---------------+----------+----------+-----+
|B |15/10/2012|15/11/2013|789 |
+---------------+----------+----------+-----+
.... and so on
Would appreciate if anyone can help. The query needs to return a count for each operationtype id based on the min/max date range specified by the user. The Min/Max functions available in sql server probably don't apply here. One possible approach that I have thought of so far makes use of the Union All approach, where I formulate a single query for a single operation type id based on the date range and then do a UNION All, any performance impacts?
You will need to store the search criteria somewhere. The best place, would probably be a temporary table with the following columns:
CREATE TABLE #SearchCriteria (
OperationTypeId VARCHAR(1)
MinDate DATETIME
MaxDate DATETIME
)
Now, once you have populated this table, a simple query like this, should give you what you want:
SELECT OperationTypeId,
MinDate,
MaxDate,
(SELECT COUNT(*) FROM Operations
WHERE OperationDate BETWEEN SC.MinDate AND SC.MaxDate
AND OperationTypeId = SC.OperationTypeId) AS [Count]
FROM
#SearchCriteria SC
If you must have everything in a single query (without using a temporary table), do something like this:
SELECT OperationTypeId,
MinDate,
MaxDate,
(SELECT COUNT(*) FROM Operations
WHERE OperationDate BETWEEN SC.MinDate AND SC.MaxDate
AND OperationTypeId = SC.OperationTypeId) AS [Count]
FROM
(VALUES ('A', '22/04/2010', '22/04/2012')
,('B', '15/10/2012', '15/11/2013')
/* ... etc ... */
) SC(OperationTypeId, MinDate, MaxDate)

Create table of unique values from table of duplicates

I have a table with 500,000+ rows and the following columns:
Symbol, ExternalCode, ExternalCodeType, StartDate
Symbol should be unique but it's not.
There are a handful of rows (~60) that have the same value for Symbol but have a different ExternalCode+StartDate pair.
I want to create a table of uniques so that, when there are multiple entries for the same Symbol, I only take the one with the most recent StartDate.
Is there a simple/elegant way to do this?
In SQL-Server this can be solved without JOINing.
Try this:
SELECT *
FROM (SELECT SYMBOL,
STARTDATE,
EXTERNALCODE,
EXTERNALCODETYPE,
Row_number()
OVER (
PARTITION BY SYMBOL
ORDER BY STARTDATE DESC) RN
FROM TABLENAME) T
WHERE T.RN = 1
The ROW_NUMBER function starts a new series of 'ID's ordered by date (so that the latest always equals 1) and partitioned by Symbol, so that each symbol has it's own set of IDs.
Hope the answer is clear.

Normalizing a table, from one to the other

I'm trying to normalize a mysql database....
I currently have a table that contains 11 columns for "categories". The first column is a user_id and the other 10 are category_id_1 - category_id_10. Some rows may only contain a category_id up to category_id_1 and the rest might be NULL.
I then have a table that has 2 columns, user_id and category_id...
What is the best way to transfer all of the data into separate rows in table 2 without adding a row for columns that are NULL in table 1?
thanks!
You can create a single query to do all the work, it just takes a bit of copy and pasting, and adjusting the column name:
INSERT INTO table2
SELECT * FROM (
SELECT user_id, category_id_1 AS category_id FROM table1
UNION ALL
SELECT user_id, category_id_2 FROM table1
UNION ALL
SELECT user_id, category_id_3 FROM table1
) AS T
WHERE category_id IS NOT NULL;
Since you only have to do this 10 times, and you can throw the code away when you are finished, I would think that this is the easiest way.
One table for users:
users(id, name, username, etc)
One for categories:
categories(id, category_name)
One to link the two, including any extra information you might want on that join.
categories_users(user_id, category_id)
-- or with extra information --
categories_users(user_id, category_id, date_created, notes)
To transfer the data across to the link table would be a case of writing a series of SQL INSERT statements. There's probably some awesome way to do it in one go, but since there's only 11 categories, just copy-and-paste IMO:
INSERT INTO categories_users
SELECT user_id, 1
FROM old_categories
WHERE category_1 IS NOT NULL