SQL Server Data cleaning [closed] - sql

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 3 years ago.
Improve this question
please I want your help
I will start working on data mining project using sql server for database.
I have big database and before I start working on my project for sure I need to do data cleaning on my database, so
please I want your suggestions what I need to do, like removing duplicate, and removing spaces from some columns ? what else and what I need to do to be sure that my data are ready to start working on it with data mining process like clustering and decisions tree ......
Also please if you have any useful videos for Data mining in general using SQL Server Management studio - sql server cleaning data
Thanks a lot in Advance...

If you want to delete duplicates, you can use recursive CTE. I recommend this website for further information on how to: SQLServerTutorial
WITH cte AS (
SELECT
YourColumns
ROW_NUMBER() OVER (
PARTITION BY
YourColumns
ORDER BY
YourColumns
) row_num
FROM
YourTable
)
DELETE FROM cte
WHERE row_num > 1;
Regarding removing blank spaces, I recommend TRIM, LTRIM & RTRIM. For further information: W3Schools
SELECT TRIM(Column) FROM YourTable

Related

Find Record in Recordset SQL Server [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I am using classic asp to produce a website and am using the below SQL statement, the database is SQL Server 2000.
SELECT * from dbo.PDBproductview where product LIKE '" & partnumbersearch &"%';"
However we now also require forward and back buttons to move forward and backwards by part number - not sure how to achieve this - my initial thought was to run another sql query and somehow get the placement of the part in the part table (product) then to pick out the part before and after it, is it possible to do this ?
You don't say what version of SQL Server you are using. However, if you are using SQL Server 2012 or higher, the LEAD and LAG functions will allow you to achieve what you want to do.
Here is a pretty good article you can use as a guide. Essentially it looks something like this:
SELECT LAG(p.FirstName) OVER (ORDER BY p.BusinessEntityID) PreviousValue,
p.FirstName, LEAD(p.FirstName) OVER (ORDER BY p.BusinessEntityID) NextValue
FROM Person.Person p
With the LEAD and LAG functions, you can indicate how far back or forward you want to look.

Sql databases select command [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 8 years ago.
Improve this question
I am new to databases. And our teacher gave us pretty hard assignment. There are two tables. First table nickname is abilities(of superhero's:) ) and second table name superheros.
We have to select nick of Superhero and his average(medial) range for those who has two abilities?
Image of both tables:
Original here: http://postimg.org/image/85pqbc47n/
I will not give you solution - after all, it's homework and you have to learn something :) But I can give you an advice - try to do one task at a time
first, find those superheroes who has only 2 abilities (actually, you can do this by quering only table with abilities)
second - try to find average range of abilities for all superheroes (here you'll need join)
combine your queries
take a look at join, group by, count and having
Don't feel bad if you can't write it at first attempt, your query is not super easy, but 'm sure you can do this.
You can use HAVING and AVG() for this:
SELECT s.NickName, AVG(a.Range)
FROM abilities a
JOIN superhero s
ON a.ID_SuperHero = s.ID_SuperHero
GROUP BY s.NickName
HAVING COUNT(DISTINCT a.Abilities > 1)

Overcoming the reserved word "IN" in sql server [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions concerning problems with code you've written must describe the specific problem — and include valid code to reproduce it — in the question itself. See SSCCE.org for guidance.
Closed 9 years ago.
Improve this question
Just for reference I am using SQL Azure.
I noticed when I am trying to select data from a table based on a license plate and the state of that plate I get no results back if the state is "IN". I realize the word "IN" is reserved in SQL server; however, I am containing that within quotes in my query. I currently am in testing phase and have only one record in the table which has a lisence plate 287YGB and state IN.
If I write my query as follows I get nothing back.
SELECT MakeModel, CitizenID, VehicleID FROM tblVehicles WHERE tblVehicles.Lisence = '287YGB' AND tblVehicles.PlateState = 'IN'
If I write my query this way I get back my result. But this is not good enough.
SELECT MakeModel, CitizenID, VehicleID FROM tblVehicles WHERE tblVehicles.Lisence = '287YGB'
And finally, if I write my query this way I get the only row in the table.
SELECT MakeModel, CitizenID, VehicleID FROM tblVehicles
From these tests I can see that the last where parameter is causing the problem. I am assuming it is due to the fact that the word "IN" is reserved. Is there a way around this?
Reserved words usually only cause problems if you're using them as field names, and in that case you need to wrap them with brackets ("[]") to eliminate the problem. I will amost guarantee you that your PlateState has some garbage in it, so you need to either trim it first (LTRIM(RTRIM(PlateState)) = 'IN') or use Like '%IN%' instead, and this will return the results you expect.
try this
SELECT MakeModel, CitizenID, VehicleID FROM tblVehicles WHERE tblVehicles.Lisence = '287YGB' AND LTRIM(RTRIM(tblVehicles.PlateState)) = 'IN'

Display endless amount of data in sql [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
If I have an endless amount of data, can I display all of it in sql?
I know there is obviously select *, but then it will never complete.
Is there a command for this?
You can use TOP to select subset of total records
SELECT TOP 100 * from table
This selects top 100 records.
By using Order By clause , you can specify the basis on which subset of records is returned.
Now if you are asking about limits of Sql Server database management system then please see this link - Maximum Capacity Specification of Sql Server
Eg
Max Databases per instance of SQL Server ( both 32 bit and 64 bit ) = 32,767
Usually, you will prefer to use some kind of paging, since you cannot actually show "endless amount of data" in user-friendly way on application.

Tool for making diagram from SQL query [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 8 years ago.
Improve this question
I have this complicated SQL query for Oracle that I want to visualize in a diagram to make it understandable for my co-workers. I tried at http://snowflakejoins.com but it just chokes on it.
Has someone a better suggestion? I prefer a web-app on the internet and if not a desktop app for windows.
with
logs as (
select
l.job_id,
l.subjob,
sum(l.verwerkt) verwerkt,
sum(l.errors) errors,
max(l.datum) laatst
from
dinf_monitor_logs l,
dinf_monitor_jobs j
where
l.datum>sysdate-j.dagen
and j.job_id=l.job_id(+)
group by
l.job_id,
l.subjob
),
alllogs as (
select job_id, subjob, max(datum) laatst from dinf_monitor_logs group by job_id, subjob
)
select row_number() over(order by alllogs.job_id, alllogs.subjob) r,
alllogs.job_id,
alljobs.naam,
alllogs.subjob,
logs.verwerkt,
logs.errors,
alllogs.laatst datum,
alljobs.wikilink,
alljobs.loglink,
alljobs.contact,
case
when alllogs.laatst is null then 1
when round(sysdate-(alllogs.laatst+alljobs.dagen))<0 then 0
else round(sysdate-(alllogs.laatst+alljobs.dagen))
end overtijd,
case
when logs.errors-alljobs.max_errors>0 then 5
when logs.verwerkt-alljobs.min_verwerkt<0 then 7
when round(sysdate-(alllogs.laatst+alljobs.dagen))>0 then 3
else 11
end status
from logs, alllogs, (select job_id, naam, wikilink, loglink, contact, dagen, min_verwerkt, max_errors from dinf_monitor_jobs) alljobs
where
logs.job_id(+)=alllogs.job_id
and logs.subjob(+)=alllogs.subjob
and alllogs.job_id=alljobs.job_id
order by alllogs.job_id, alllogs.subjob
You can use the "Query Builder" tab of the Oracle's SQL Developer.
The result of your sample query will be:
Each of the sub queries are data sets, I would just make a plain English statement of what the query does, then describe the data sets and how they relate to one another in an entity-relationship manner, then show how the query satisfies the plain English statement. You can represent the E-R with any variety of tools.
Have found how to do it in Toad, which i prefer above Sql Developer.
Open the editorwindow, paste the sql, rightclick in the editorwindow and select "Send to queryviewer"
My sql above is too complicated to use this technique but it's nice to know i can use it in the future with more "normal" queries.
Points to Sergio.