Selecting/updating single field from duplicates - sql

This is my Sql:
SELECT * FROM parameter1
WHERE parameter1_ID IN (34,11)
this results in over 400 results (which is normal), most results have multiple values in VALUE column and some of the results have duplicate and multiple values in the VALUE column. I want to get rid of the duplicate VALUEs within their respective results, what can I add to this query to do so?
Using MS SQL studio
example result from query:
ID VALUE
1 100,200
2 100,100,200
3 200,200,300
4 200,200,300
result I want
ID VALUE
1 100,200
2 100,200
3 200,300
4 200,300

As many others have said, the way you are storing data is very bad practice and will almost certainly cause many more headaches in the future.
That said, if you are actually unable to change this there are still options for you. In SQL Server 2017 you have the benefit of string_split and string_agg:
declare #t table(ID int, val varchar(40))
insert into #t values
(1,'100,200')
,(2,'100,100,200')
,(3,'200,200,300')
,(4,'200,200,300');
with s as
(
select distinct t.ID
,s.[value] as val
from #t as t
cross apply string_split(t.val,',') as s
)
select s.ID
,string_agg(s.val,',') as val
from s
group by s.ID;
Output
+----+---------+
| ID | val |
+----+---------+
| 1 | 100,200 |
| 2 | 100,200 |
| 3 | 200,300 |
| 4 | 200,300 |
+----+---------+

Related

SQL displaying results as columns from rows

SQL Noob here.
I realize that many variations to this question have been asked but none seems to work or be fully applicable to my annoying situation, ie. I dont think PIVOT would work for what I require. I cant fathom the necessary words to google what I need efficiently.
I have the following query:
Select w.WORKORDERID, y.Answer
From
[SD].[dbo].[WORKORDERSTATES] w
LEFT JOIN [SD].[dbo].[WO_RESOURCES] x
ON w.workorderid = x.woid
Full Outer Join [SD].[dbo].ResourcesQAMapping y
ON x.UID = y.MAPPINGID
WHERE w.APPR_STATUSID = '2'
AND w.STATUSID = '1'
AND w.REOPENED = 'false'
It will bring back the following result:
+-----------+---------------------+
| WORKORDER | Answer |
+-----------+---------------------+
| 55693 | Brad Pitt |
| 55693 | brad.pitt#mycom.com |
| 55693 | Location |
| 55693 | NULL |
| 55693 | george |
+-----------+---------------------+
I would like all rows related to the value 55693 to output as columns like below:
+-----------+-----------+---------------------+----------+--------+--------+
| WORKORDER | VALUE1 | VALUE2 | VALUE3 | VALUE4 | VALUE5 |
+-----------+-----------+---------------------+----------+--------+--------+
| 55693 | Brad Pitt | brad.pitt#mycom.com | Location | NULL | george |
+-----------+-----------+---------------------+----------+--------+--------+
There will always be the same amount of values, and I am almost sure that the solution involves creating a temporary table but I cant get it to work any which way.
Any help would be greatly appreciated.
If you always have the same number of values (5) you can use a static PIVOT, otherwise you need a dynamic TSQL statement with PIVOT.
In both cases you'll need to add a column to guarantee rows/columns ordering otherwise there is no guarantee that you'll see the correct value in each column.
Here is a sample query thet uses a static PIVOT on 5 values (but remember to add a column to properly order the data replacing ORDER BY WORKORDER with ORDER BY YOUR_COLUMN_NAME):
declare #tmp table (WORKORDER int, Answer varchar(50))
insert into #tmp values
(55693, 'Brad Pitt')
,(55693, 'brad.pitt#mycom.com')
,(55693, 'Location')
,(55693, 'NULL')
,(55693, 'george')
select * from
(
select
WORKORDER,
Answer,
CONCAT('VALUE', ROW_NUMBER() OVER (PARTITION BY WORKORDER ORDER BY WORKORDER)) AS COL
from #tmp
) as src
pivot
(
max(Answer) for COL in ([VALUE1], [VALUE2], [VALUE3], [VALUE4], [VALUE5])
)
as pvt
Results:
Try to select another column that has different values as answer column and try to run pivot and that will work

SQL Server stored procedure inserting duplicate rows

I have a table with column GetDup and I'd like to the duplicate records based on the value of this column. For example, if value on is 1 in GetDup, then duplicate the record once. If value in the column is 2, then duplicate the record twice and so on and the statement has to be in looping statement.
What will be a good way to write a stored procedures for this? Please help.
Input:
+--------+--------------+---------------+
| Getdup | CustomerName | CustomerAdd |
+--------+--------------+---------------+
| 1 | John | 123 SomeWhere |
| 2 | Bob | 987 SomeWhere |
+--------+--------------+---------------+
What I want:
+--------+--------------+---------------+
| Getdup | CustomerName | CustomerAdd |
+--------+--------------+---------------+
| 1 | John | 123 SomeWhere |
| 1 | John | 123 SomeWhere |
| 2 | Bob | 987 SomeWhere |
| 2 | Bob | 987 SomeWhere |
| 2 | Bob | 987 SomeWhere |
+--------+--------------+---------------+
picture of data
Answer #2 After Clarification
Number Table to the Rescue!
The number table in my example (or tally table, if you want to call it that), is both temporary and very small. To make it bigger, just add more values to z and add more CROSS JOINs. In my opinion, a number table and a calendar table are both things that should be in every database you have. They are extremely useful.
SQL Fiddle
MS SQL Server 2017 Schema Setup:
CREATE TABLE mytable ( Getdup int, CustomerName varchar(10), CustomerAdd varchar(20) ) ;
INSERT INTO mytable (Getdup, CustomerName, CustomerAdd)
VALUES (1,'John','123 SomeWhere'), (2,'Bob','987 SomeWhere')
;
Query 1:
;WITH z AS (
SELECT *
FROM ( VALUES(0),(0),(0),(0) ) v(x)
)
, numTable AS (
SELECT num
FROM (
SELECT ROW_NUMBER() OVER (ORDER BY z1.x)-1 num
FROM z z1
CROSS JOIN z z2
) s1
)
SELECT t1.Getdup, t1.CustomerName, t1.CustomerAdd
FROM mytable t1
INNER JOIN numTable ON t1.getdup >= numTable.num
ORDER BY CustomerName, CustomerAdd
Results:
| Getdup | CustomerName | CustomerAdd |
|--------|--------------|---------------|
| 2 | Bob | 987 SomeWhere |
| 2 | Bob | 987 SomeWhere |
| 2 | Bob | 987 SomeWhere |
| 1 | John | 123 SomeWhere |
| 1 | John | 123 SomeWhere |
--------------------------------------------------------------------------
ORIGINAL ANSWER
EDIT: After further clarification of the problem, this won't duplicate rows, this will only duplicate the data in a column.
Something like one of these might work.
T-SQL
SELECT replicate(mycolumn,getdup) AS x
FROM mytable
MySQL
SELECT repeat(mycolumn,getdup) AS x
FROM mytable
Oracle SQL
SELECT rpad(mycolumn,getdup*length(mycolumn),mycolumn) AS x
FROM mytable
PostgreSQL
SELECT repeat(mycolumn,getdup+1) AS x
FROM mytable
If you can provide more details for exactly what you want and what you're working with, we might be able to help you better.
NOTE 2: Depending on what you need, you may need to do some math magic. You say above if GetDup is 1 then you want one duplicate. If that means that your output should be GetDup``GetDup, then you'll want to add one in the repeat(),replicate() or rpad() functions. ie replicate(mycolumn,getdup+1). Oracle SQL will be a little different, since it uses rpad().
In standard SQL you can use a recursive CTE:
with recursive cte as (
select t.dup, . . .
from t
union all
select cte.dup - 1, . . .
from cte
where cte.dup > 1
)
select *
from cte;
Of course, not all databases support recursive CTEs (and the recursive keyword is not used in some of them).
So, you want recursive solution :
with t as (
select Getdup, CustomerName, CustomerAdd, 0 as id
from table
union all
select Getdup, CustomerName, CustomerAdd, id + 1
from t
where id < getdup
)
insert into table (col1, col2, col3)
select Getdup, CustomerName, CustomerAdd
from t
order by getdup
option (maxrecursion 0);

Counting the total number of rows with SELECT DISTINCT ON without using a subquery

I have performing some queries using PostgreSQL SELECT DISTINCT ON syntax. I would like to have the query return the total number of rows alongside with every result row.
Assume I have a table my_table like the following:
CREATE TABLE my_table(
id int,
my_field text,
id_reference bigint
);
I then have a couple of values:
id | my_field | id_reference
----+----------+--------------
1 | a | 1
1 | b | 2
2 | a | 3
2 | c | 4
3 | x | 5
Basically my_table contains some versioned data. The id_reference is a reference to a global version of the database. Every change to the database will increase the global version number and changes will always add new rows to the tables (instead of updating/deleting values) and they will insert the new version number.
My goal is to perform a query that will only retrieve the latest values in the table, alongside with the total number of rows.
For example, in the above case I would like to retrieve the following output:
| total | id | my_field | id_reference |
+-------+----+----------+--------------+
| 3 | 1 | b | 2 |
+-------+----+----------+--------------+
| 3 | 2 | c | 4 |
+-------+----+----------+--------------+
| 3 | 3 | x | 5 |
+-------+----+----------+--------------+
My attemp is the following:
select distinct on (id)
count(*) over () as total,
*
from my_table
order by id, id_reference desc
This returns almost the correct output, except that total is the number of rows in my_table instead of being the number of rows of the resulting query:
total | id | my_field | id_reference
-------+----+----------+--------------
5 | 1 | b | 2
5 | 2 | c | 4
5 | 3 | x | 5
(3 rows)
As you can see it has 5 instead of the expected 3.
I can fix this by using a subquery and count as an aggregate function:
with my_values as (
select distinct on (id)
*
from my_table
order by id, id_reference desc
)
select count(*) over (), * from my_values
Which produces my expected output.
My question: is there a way to avoid using this subquery and have something similar to count(*) over () return the result I want?
You are looking at my_table 3 ways:
to find the latest id_reference for each id
to find my_field for the latest id_reference for each id
to count the distinct number of ids in the table
I therefore prefer this solution:
select
c.id_count as total,
a.id,
a.my_field,
b.max_id_reference
from
my_table a
join
(
select
id,
max(id_reference) as max_id_reference
from
my_table
group by
id
) b
on
a.id = b.id and
a.id_reference = b.max_id_reference
join
(
select
count(distinct id) as id_count
from
my_table
) c
on true;
This is a bit longer (especially the long thin way I write SQL) but it makes it clear what is happening. If you come back to it in a few months time (somebody usually does) then it will take less time to understand what is going on.
The "on true" at the end is a deliberate cartesian product because there can only ever be exactly one result from the subquery "c" and you do want a cartesian product with that.
There is nothing necessarily wrong with subqueries.

sql insert from table to table

I have a table Farm with these columns
FarmID:(primary)
Kenizy:
BarBedo:
BarBodo:
MorKodo:
These columns are palm types in some language. each column of those contains a number indicates the number of this type of palm inside a farm.
Example:
FarmID | Kenizy | BarBedo | BarBodo | MorKodo
-----------------------------------------------
3 | 20 | 12 | 45 | 60
22 | 21 | 9 | 41 | 3
I want to insert that table into the following tables:
Table Palm_Farm
FarmID:(primary)
PalmID;(primary)
PalmTypeName:
Count:
That table connects each farm with each palm type.
Example:
FarmID | PalmID | PalmTypeName | Count
-----------------------------------------------
3 | 1 | Kenizy | 20
3 | 2 | BarBedo | 12
3 | 3 | BarBodo | 45
3 | 4 | MorKodo | 60
22 | 1 | Kenizy | 21
22 | 2 | BarBedo | 9
22 | 3 | BarBodo | 41
22 | 4 | MorKodo | 3
I have to use the following table Palms in order to take the PalmID column.
PalmID:(primary)
PlamTypeName:
...other not important columns
This table is to save information about each palm type.
Example:
PalmID | PlamTypeName
-------------------------
1 | Kenizy
2 | BarBedo
3 | BarBodo
4 | MorKodo
The PalmTypeName column has the value the same as the COLUMN NAMES in the Farm table.
So my question is:
How to insert the data from Farm table to Palm_Farm considering that the PalmID exist in the Palm table
I hope I could make my question clear, I tried to solve my problem myself but the fact that the column name in the Farm table must be the column value in the Palm_Farm table couldn't know how to do it.
I can't change the table structure because we are trying to help a customer with this already existing tables
I am using SQL Server 2008 so Merge is welcomed.
Update
After the genius answer by #GarethD, I got this exception
You can use UNPIVOT to turn the columns into rows:
INSERT Palm_Farm (FarmID, PalmID, PalmTypeName, [Count])
SELECT upvt.FarmID,
p.PalmID,
p.PalmTypeName,
upvt.[Count]
FROM Farm AS f
UNPIVOT
( [Count]
FOR PalmTypeName IN ([Kenizy], [BarBedo], [BarBodo], [MorKodo])
) AS upvt
INNER JOIN Palms AS p
ON p.PalmTypeName = upvt.PalmTypeName;
Example on SQL Fiddle
The docs for UNPIVOT state:
UNPIVOT performs almost the reverse operation of PIVOT, by rotating columns into rows. Suppose the table produced in the previous example is stored in the database as pvt, and you want to rotate the column identifiers Emp1, Emp2, Emp3, Emp4, and Emp5 into row values that correspond to a particular vendor. This means that you must identify two additional columns. The column that will contain the column values that you are rotating (Emp1, Emp2,...) will be called Employee, and the column that will hold the values that currently reside under the columns being rotated will be called Orders. These columns correspond to the pivot_column and value_column, respectively, in the Transact-SQL definition.
To explain further how unpivot works, I will look at the first row original table:
FarmID | Kenizy | BarBedo | BarBodo | MorKodo
-----------------------------------------------
3 | 20 | 12 | 45 | 60
So what UPIVOT will do is look for columns specified in the UNPIVOT statement, and create a row for each column:
SELECT upvt.FarmID, upvt.PalmTypeName, upvt.[Count]
FROM Farm AS f
UNPIVOT
( [Count]
FOR PalmTypeName IN ([Kenizy], [BarBedo])
) AS upvt;
So here you are saying, for every row find the columns [Kenizy] and [BarBedo] and create a row for each, then for each of these rows create a new column called PalmTypeName that will contain the column name used, then put the value of that column into a new column called [Count]. Giving a result of:
FarmID | Kenizy | Count |
---------------------------
3 | Kenizy | 20 |
3 | BarBedo | 12 |
If you are running SQL Server 2000, or a later version with a lower compatibility level, then you may need to use a different query:
INSERT Palm_Farm (FarmID, PalmID, PalmTypeName, [Count])
SELECT f.FarmID,
p.PalmID,
p.PalmTypeName,
[Count] = CASE upvt.PalmTypeName
WHEN 'Kenizy' THEN f.Kenizy
WHEN 'BarBedo' THEN f.BarBedo
WHEN 'BarBodo' THEN f.BarBodo
WHEN 'MorKodo' THEN f.MorKodo
END
FROM Farm AS f
CROSS JOIN
( SELECT PalmTypeName = 'Kenizy' UNION ALL
SELECT PalmTypeName = 'BarBedo' UNION ALL
SELECT PalmTypeName = 'BarBodo' UNION ALL
SELECT PalmTypeName = 'MorKodo'
) AS upvt
INNER JOIN Palms AS p
ON p.PalmTypeName = upvt.PalmTypeName;
This is similar, but you have to create the additional rows yourself using UNION ALL inside the subquery upvt, then choose the value for [Count] using a case expression.
To update when the row exists you can use MERGE
WITH Data AS
( SELECT upvt.FarmID,
p.PalmID,
p.PalmTypeName,
upvt.[Count]
FROM Farm AS f
UNPIVOT
( [Count]
FOR PalmTypeName IN ([Kenizy], [BarBedo], [BarBodo], [MorKodo])
) AS upvt
INNER JOIN Palms AS p
ON p.PalmTypeName = upvt.PalmTypeName
)
MERGE Palm_Farm WITH (HOLDLOCK) AS pf
USING Data AS d
ON d.FarmID = pf.FarmID
AND d.PalmID = pf.PalmID
WHEN NOT MATCHED BY TARGET THEN
INSERT (FarmID, PalmID, PalmTypeName, [Count])
VALUES (d.FarmID, d.PalmID, d.PalmTypeName, d.[Count])
WHEN MATCHED THEN
UPDATE
SET [Count] = d.[Count],
PalmTypeName = d.PalmTypeName;

In SQL, what's the difference between count(column) and count(*)?

I have the following query:
select column_name, count(column_name)
from table
group by column_name
having count(column_name) > 1;
What would be the difference if I replaced all calls to count(column_name) to count(*)?
This question was inspired by How do I find duplicate values in a table in Oracle?.
To clarify the accepted answer (and maybe my question), replacing count(column_name) with count(*) would return an extra row in the result that contains a null and the count of null values in the column.
count(*) counts NULLs and count(column) does not
[edit] added this code so that people can run it
create table #bla(id int,id2 int)
insert #bla values(null,null)
insert #bla values(1,null)
insert #bla values(null,1)
insert #bla values(1,null)
insert #bla values(null,1)
insert #bla values(1,null)
insert #bla values(null,null)
select count(*),count(id),count(id2)
from #bla
results
7 3 2
Another minor difference, between using * and a specific column, is that in the column case you can add the keyword DISTINCT, and restrict the count to distinct values:
select column_a, count(distinct column_b)
from table
group by column_a
having count(distinct column_b) > 1;
A further and perhaps subtle difference is that in some database implementations the count(*) is computed by looking at the indexes on the table in question rather than the actual data rows. Since no specific column is specified, there is no need to bother with the actual rows and their values (as there would be if you counted a specific column). Allowing the database to use the index data can be significantly faster than making it count "real" rows.
The explanation in the docs, helps to explain this:
COUNT(*) returns the number of items in a group, including NULL values and duplicates.
COUNT(expression) evaluates expression for each row in a group and returns the number of nonnull values.
So count(*) includes nulls, the other method doesn't.
We can use the Stack Exchange Data Explorer to illustrate the difference with a simple query. The Users table in Stack Overflow's database has columns that are often left blank, like the user's Website URL.
-- count(column_name) vs. count(*)
-- Illustrates the difference between counting a column
-- that can hold null values, a 'not null' column, and count(*)
select count(WebsiteUrl), count(Id), count(*) from Users
If you run the query above in the Data Explorer, you'll see that the count is the same for count(Id) and count(*)because the Id column doesn't allow null values. The WebsiteUrl count is much lower, though, because that column allows null.
The COUNT(*) sentence indicates SQL Server to return all the rows from a table, including NULLs.
COUNT(column_name) just retrieves the rows having a non-null value on the rows.
Please see following code for test executions SQL Server 2008:
-- Variable table
DECLARE #Table TABLE
(
CustomerId int NULL
, Name nvarchar(50) NULL
)
-- Insert some records for tests
INSERT INTO #Table VALUES( NULL, 'Pedro')
INSERT INTO #Table VALUES( 1, 'Juan')
INSERT INTO #Table VALUES( 2, 'Pablo')
INSERT INTO #Table VALUES( 3, 'Marcelo')
INSERT INTO #Table VALUES( NULL, 'Leonardo')
INSERT INTO #Table VALUES( 4, 'Ignacio')
-- Get all the collumns by indicating *
SELECT COUNT(*) AS 'AllRowsCount'
FROM #Table
-- Get only content columns ( exluce NULLs )
SELECT COUNT(CustomerId) AS 'OnlyNotNullCounts'
FROM #Table
COUNT(*) – Returns the total number of records in a table (Including NULL valued records).
COUNT(Column Name) – Returns the total number of Non-NULL records. It means that, it ignores counting NULL valued records in that particular column.
Basically the COUNT(*) function return all the rows from a table whereas COUNT(COLUMN_NAME) does not; that is it excludes null values which everyone here have also answered here.
But the most interesting part is to make queries and database optimized it is better to use COUNT(*) unless doing multiple counts or a complex query rather than COUNT(COLUMN_NAME). Otherwise, it will really lower your DB performance while dealing with a huge number of data.
Further elaborating upon the answer given by #SQLMeance and #Brannon making use of GROUP BY clause which has been mentioned by OP but not present in answer by #SQLMenace
CREATE TABLE table1 (
id INT
);
INSERT INTO table1 VALUES
(1),
(2),
(NULL),
(2),
(NULL),
(3),
(1),
(4),
(NULL),
(2);
SELECT * FROM table1;
+------+
| id |
+------+
| 1 |
| 2 |
| NULL |
| 2 |
| NULL |
| 3 |
| 1 |
| 4 |
| NULL |
| 2 |
+------+
10 rows in set (0.00 sec)
SELECT id, COUNT(*) FROM table1 GROUP BY id;
+------+----------+
| id | COUNT(*) |
+------+----------+
| 1 | 2 |
| 2 | 3 |
| NULL | 3 |
| 3 | 1 |
| 4 | 1 |
+------+----------+
5 rows in set (0.00 sec)
Here, COUNT(*) counts the number of occurrences of each type of id including NULL
SELECT id, COUNT(id) FROM table1 GROUP BY id;
+------+-----------+
| id | COUNT(id) |
+------+-----------+
| 1 | 2 |
| 2 | 3 |
| NULL | 0 |
| 3 | 1 |
| 4 | 1 |
+------+-----------+
5 rows in set (0.00 sec)
Here, COUNT(id) counts the number of occurrences of each type of id but does not count the number of occurrences of NULL
SELECT id, COUNT(DISTINCT id) FROM table1 GROUP BY id;
+------+--------------------+
| id | COUNT(DISTINCT id) |
+------+--------------------+
| NULL | 0 |
| 1 | 1 |
| 2 | 1 |
| 3 | 1 |
| 4 | 1 |
+------+--------------------+
5 rows in set (0.00 sec)
Here, COUNT(DISTINCT id) counts the number of occurrences of each type of id only once (does not count duplicates) and also does not count the number of occurrences of NULL
It is best to use
Count(1) in place of column name or *
to count the number of rows in a table, it is faster than any format because it never go to check the column name into table exists or not
There is no difference if one column is fix in your table, if you want to use more than one column than you have to specify that how much columns you required to count......
Thanks,
As mentioned in the previous answers, Count(*) counts even the NULL columns, whereas count(Columnname) counts only if the column has values.
It's always best practice to avoid * (Select *, count *, …)