EndDate on Dimension Table - Should we go with NULL or 99991231 Date Value [closed] - sql

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
I am building a Data Warehouse on SQL Server and I was wondering what is the best approach in handling the current record in a dimension table (SCD type 2) with respect to the 'end_date' attribute.
For the current record, we have the option of using a date literal such as '12/31/9999' or specify it as NULL. The dimension tables also have an additional 'current_flag' attribute in addition to 'start_date' and 'end_date'.
It is probably a minor design decision but just wanted to see if there are any advantages of using one over the other in terms of query performance or in any other way?

I have seen systems written both ways. Personally, I go for the infinite end date (but not NULL and the reason is simple: it is easier to validate that the type-2 records are properly tiled, with no gaps or overlaps. I prefer only one validation to two -- the other being the validation of the is_current flag. There is also only one correct way of accessing the data.
That said, a system that I'm currently working on also publishes a view with only the current records. That is handy.
That system is not in SQL Server. One optimization that you can attempt is clustering so the current records are all colocated -- assuming they are much more commonly accessed. You can do this using either method. Using a clustered index like this makes updates more expensive, but they can be handy for optimizing memory.

Related

How to use binary search algorithm without index in Database [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
Let's say I want to run this query
select *
from table
where column_1 = 12
I know how binary search working, maybe if I create an index on column_1, dbms will use the binary search.
The question here: how to use different algorithm in this case after creating index and is that applicable or not?
SQL is a declarative language, meaning, you define what you want to achieve, but the how it should be done is determined by the database engine.
In some cases/products you can force the behavior, but in general what algorithm is used to get the results is not controlled by the user.
Most database engines will try to get the desired results in the most optimal way, which is determined by the engine based on the information it has about the query and the underlaying data.
Indexes help the database engine to understand the data by providing information about possible values, their selectivity, etc, but at the end the database engine will make a decision if the index will be used or not.
Say, you have an index on a table which is storing the details of users. The index itself is on the column 'created_at' which is the time when the record was created.
Lets now say, that you started the business on 2019-09-01. Now if you have a query like this: SELECT * FROM users WHERE created_at > '2019-01-01', the database engine could use the index, but all records will match the where condition, therefore the engine will most probably decide to iterate on the clustered key, instead of using the index, because seeking the index, than doing a lookup for all records will need more resources than simply reading the entire table.
However if you execute the query with a different date, say 2021-09-01, the index most probably will be used.

Does SQL NOT IN opeator scales good? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
I'm writing an app where users pass quizzes. So my purpose is to show quizzes which user didn't tackle before. For this reason I'm using SELECT id, name, problem FROM quizzes WHERE id NOT IN (...).
Imagine that there will be thousands of ids and quizzes.
Is it ok? How does it scale? Probably I need to redesign something / using DB appropriate for that or use another technique to achieve my purpose?
If you have a fixed list, then it should be fine.
If you have a subquery, then I strongly encourage not exists:
from foo f
where not exists (select 1 from <bar> b where b.quiz_id = f.quiz_id)
I recommend this based on the semantics of not exists versus not in. not exists handles NULL values more intuitively.
That said, with appropriate indexes, in most databases, not exists also often has the better performance.
You should consider there are limits to the SQL statements length imposed by each database engine. Though I haven't tested those limits, a 1k values in an IN operator should still work well for most databases, I would think that if you scale it up to 10k or more it could reach some databases limits and your statements will crash.
I would suggest rethinking this solution unless you can verify the worst possible case (with maximum parameters) still works well.
Usually a subquery can do the job, instead of manually sending 1k parameters or assembling a big SQL statement by concatenating strings.

What's best practice for designing a table with many access points in Dynamodb? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
So I am working on an application that is designed to be highly searchable - almost every field in the table will be querable. Let's call the table "job." It would look something like this in pseudo code:
jobID: string
accountID: string
type: string
keywords: Array<string>
salaryLow: number
salaryHigh: number
numberOfApplications: number
numberOfViews: number
title: string
postedDate: string
description: string
location: Location
So in this application, I would like to be able to order/query by all of these fields. However, I'm wary of creating a global secondary index for all of these fields because that seems like it's an anti-pattern. If I add an index for each of these fields, I believe that each write operation would take some time to be eventually consistent.
Currently, I have jobID setup as the partition key and keywords as the sort key but that doesn't make it very flexible for querying the other fields without resorting to a full table scan.
Can anyone give advice on this? Very new to Dynamodb. Thanks!
Again, DynamoDB isn’t the best solution if you want to use a single service. But it could be a great solution if you use it with ElasticSearch or even AWS CloudSearch.
You can create a DynamoDB Stream to forward the data updates in the table directly to CloudSearch. Then you just need to use the CloudSearch endpoint to make the queries and then recover the hash key or even show all values directly from CloudSearch.
I don’t know for sure, but maybe the CloudSearch will cost less than all the planned indexes...
As mentioned in the comments as well, Dynamo wouldn't really be an ideal database for the problem. You would even end up overwriting a lot of data depending on the sort key and the partition key.
Looking at the columns, the ideal scenario would be to use something like Mongo and create indexes on the columns that need to be frequently queried.

Putting my attendance base on calendar and insert in my database [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I am making a attendance system , i have a database which is i created and it has a tables from jan-dec (12 tables) and each table has this kind of column (Jan1,jan2//feb1,feb2//mar1,mar2... etc ) i know it is not a good practice tho i'm not familiar with sql, I would like to ask if how would i be able to make a much lesser tables/columns ? and it will based on my datepicker in my vb.net program?
Delve deeper into relational database design (take the link as a first step).
One thing is to create just one table, and have in it a column of type DATE or DATETIME to denote the date. Additional columns would have related data that is linked to the date. That would simplify your table structure greatly. From 12 tables with approx. 30 colums, to just one table with one column + columns with related information.

Represent Is"Something" Data in SQL-Server [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
Joe Celko (sql guru) says that we should not use proprietary data types and especially refrain from machine level things like bit or byte since SQL-Server uses a high level language. Basically that the principle of data modeling is data abstraction. So discerning the above recommendation for fields like "IsActive" etc., what would the correct choice be for the data type, one that is a very portable and one that is deciphered clearly by front end layers? Thanks!
In SQL Server, I would go for BIT data type as it matches the abstract requirements that you describe: it can have 2 values (which map to Yes and No by a widely used convention of Yes = 1 and No = 0). It can have an additional NULL value if desired.
If possible, using native data types has all the benefits of performance, clarity and understandability for others. Not to mention the principle of not overcomplicating things when you can keep them simple.
SQL Server doesn't have a Boolean data type so Boolean is out of the question. BIT is a numeric type that accepts the values 0 and 1 as well as null. I usually prefer a CHAR type with a CHECK constraint permitting values like "Y"/"N" or "T"/"F". CHAR at least lets you extend the set of values to more than just two if you want to.
BIT has the potential disadvantage that it's non-standard, not particularly user-friendly and not well understood even by SQL Server users. The semantics of BIT are very peculiar in SQL Server and even Microsoft's own products treat BIT in inconsistent ways.