How to get data from mssql with similar description? - sql

So I have table like:
id | description | code | unit_value | short
1 | awesome product DG | CODEB14 | null | BT
2 | awesome product | CODE14 | 5005 | NOBT
3 | product less awe BGO | CODEB15 | null | BT
4 | product less awe | CODE15 | 5006 | NOBT
And I need display 'unit_value ' for items with DG, BGO but need to base on items without DG, BGO. So item 'awesome product DG' have the same 'unit_value' as
'awesome product' item. But I can not assign value for items where 'short = BT'.
So what I have so far are two queries which some how I want to merge:
select value_i_need from my_table where short= 'BT'
select value_i_need from my_table where short!= 'BT' and description like '%awesome product%'
And I have no idea how to merge those two queries? Some suggestion would be very helpful.

You need to join two copies of the table together
CREATE TABLE #mytable
(
id INT,
description VARCHAR(50),
code VARCHAR(10),
unitvalue INT NULL,
short VARCHAR(10)
)
INSERT INTO #mytable
(
id,
description,
code,
unitvalue,
short
)
VALUES
(1, 'awesome product DG' , 'CODEB14' , null ,'BT'),
(2, 'awesome product' , 'CODE14' , 5005 ,'NOBT'),
(3, 'product less awe BGO' , 'CODEB15' , null ,'BT'),
(4, 'product less awe' , 'CODE15' , 5006 ,'NOBT');
SELECT a.description, a.code, b.description, b.code, b.short, b.unitvalue, a.description, a.short
FROM #myTable a
LEFT OUTER JOIN #myTable b ON a.description LIKE b.description + '%'
AND b.short != 'BT'
WHERE a.short = 'BT'
However, this is making a lot of assumptions i.e. that there is only one such item for each row, that you don't have products with similar names where the "like" would confuse the two. Also joining on a "like" is going to be slow if there is any kind of volume. So although this works on this trivial example data, I'm not sure I recommend you actually use it.
It feels to me like this data should not all be in the same table. You should have one table with the BT entries, and another with the NOBT entries and a foreign key to the BT table. Maybe? Its not totally clear what the data represents, but might point you in the right direction.

Do you just want or?
select value_i_need
from my_table
where short = 'BT' or
(short <> 'BT' and description like '%here is the name%')

You could use code like below. You need to use table aliases (T1 and T2 below) to help match the columns. This is a correlated sub-query assuming there is exactly one match. I'll point out that LIKE will cause problems with multiple rows returned if you have more than one product that matches.
select (
select unit_value
from my_table T2
WHERE T2.description like '%awesome product%'
AND T2.short = 'NOBT'
)
from my_table T1
where T1.short= 'BT' AND T1.description LIKE '%awesome product%'

Related

Combine multiple rows with different column values into a single one

I'm trying to create a single row starting from multiple ones and combining them based on different column values; here is the result i reached based on the following query:
select distinct ID, case info when 'name' then value end as 'NAME', case info when 'id' then value end as 'serial'
FROM TABLENAME t
WHERE info = 'name' or info = 'id'
Howerver the expected result should be something along the lines of
I tried with group by clauses but that doesn't seem to work.
The RDBMS is Microsoft SQL Server.
Thanks
SELECT X.ID,MAX(X.NAME)NAME,MAX(X.SERIAL)AS SERIAL FROM
(
SELECT 100 AS ID, NULL AS NAME, '24B6-97F3'AS SERIAL UNION ALL
SELECT 100,'A',NULL UNION ALL
SELECT 200,NULL,'8113-B600'UNION ALL
SELECT 200,'B',NULL
)X
GROUP BY X.ID
For me GROUP BY works
A simple PIVOT operator can achieve this for dynamic results:
SELECT *
FROM
(
SELECT id AS id_column, info, value
FROM tablename
) src
PIVOT
(
MAX(value) FOR info IN ([name], [id])
) piv
ORDER BY id ASC;
Result:
| id_column | name | id |
|-----------|------|------------|
| 100 | a | 24b6-97f3 |
| 200 | b | 8113-b600 |
Fiddle here.
I'm a fan of a self join for things like this
SELECT tName.ID, tName.Value AS Name, tSerial.Value AS Serial
FROM TableName AS tName
INNER JOIN TableName AS tSerial ON tSerial.ID = tName.ID AND tSerial.Info = 'Serial'
WHERE tName.Info = 'Name'
This initially selects only the Name rows, then self joins on the same IDs and now filter to the Serial rows. You may want to change the INNER JOIN to a LEFT JOIN if not everything has a Name and Serial and you want to know which Names don't have a Serial

SQL get top level object from joins

Working on a query right now where we want to understand which business is referring the most downstream orders for us. I've put together a very basic table for demonstration purposes here with 4 businesses listed. Bar and Donut were both ultimately referred by Foo and I want to be able to show Foo as a business has generated X number of orders. Obviously getting the the single referral for Foo (from Bar) and Bar (from Donut) are simple joins. But how do you go from Bar to get back to Foo?
I'll add that I've done some more googling this AM and found a few very similar questions about the top level parent and most of the responses suggest recursive CTE. It's been awhile since I've dug deep into SQL stuff, but 8 years ago I know these were not overly popular. Is there another way around this? Perhaps better to just store that parent ID on the order table at the time of order?
+----+--------+--------------------+
| Id | Name | ReferralBusinessId |
+----+--------+--------------------+
| 1 | Foo | |
| 2 | Bar | 1 |
| 3 | Donut | 2 |
| 4 | Coffee | |
+----+--------+--------------------+
WITH RECURSIVE entity_hierarchy AS (
SELECT id, name, parent FROM entities WHERE name = 'Donut'
UNION
SELECT e.id, e.name, e.parent FROM entities e INNER JOIN entity_hierarchy eh on e.id = eh.parent
)
SELECT id, name, parent FROM entity_hierarchy;
SQL Fiddle Example
Assuming you're using SQL Server, you could use a query like the one below to generate a hierarchical Id path for a particular business.
declare #tbl as table (Id int, Name varchar(30), ReferralBusinessId int)
insert into #tbl (id, Name, ReferralBusinessId) values
(1, 'Foo', null),
(2, 'Bar', 1),
(3, 'Donut', 2),
(4, 'Coffee', null);
;WITH business AS (
SELECT Id, Name, ReferralBusinessId
, 0 AS Level
, CAST(Id AS VARCHAR(255)) AS Path
FROM #tbl
UNION ALL
SELECT R.Id, R.Name, R.ReferralBusinessId
, Level + 1
, CAST(Path + '.' + CAST(R.Id AS VARCHAR(255)) AS VARCHAR(255))
FROM #tbl R
INNER JOIN business b ON b.Id = R.ReferralBusinessId
)
SELECT * FROM business ORDER BY Path

Rotate rows into columns with column names not coming from the row

I've looked at some answers but none of them seem to be applicable to me.
Basically I have this result set:
RowNo | Id | OrderNo |
1 101 1
2 101 10
I just want to convert this to
| Id | OrderNo_0 | OrderNo_1 |
101 1 10
I know I should probably use PIVOT. But the syntax is just not clear to me.
The order numbers are always two. To make things clearer
And if you want to use PIVOT then the following works with the data provided:
declare #Orders table (RowNo int, Id int, OrderNo int)
insert into #Orders (RowNo, Id, OrderNo)
select 1, 101, 1 union all select 2, 101, 10
select Id, [1] OrderNo_0, [2] OrderNo_1
from (
select RowNo, Id, OrderNo
from #Orders
) SourceTable
pivot (
sum(OrderNo)
for RowNo in ([1],[2])
) as PivotTable
Reference: https://learn.microsoft.com/en-us/sql/t-sql/queries/from-using-pivot-and-unpivot?view=sql-server-2017
Note: To build each row in the result set the pivot function is grouping by the columns not begin pivoted. Therefore you need an aggregate function on the column that is being pivoted. You won't notice it in this instance because you have unique rows to start with - but if you had multiple rows with the RowNo and Id you would then find the aggregation comes into play.
As you say there are only ever two order numbers per ID, you could join the results set to itself on the ID column. For the purposes of the example below, I'm assuming your results set is merely selecting from a single Orders table, but it should be easy enough to replace this with your existing query.
SELECT o1.ID, o1.OrderNo AS [OrderNo_0], o2.OrderNo AS [OrderNo_1]
FROM Orders AS o1
INNER JOIN Orders AS o2
ON (o1.ID = o2.ID AND o1.OrderNo <> o2.OrderNo)
From your sample data, simplest you can try to use min and MAX function.
SELECT Id,min(OrderNo) OrderNo_0,MAX(OrderNo) OrderNo_1
FROM T
GROUP BY Id

SQL Unpivot / Coalesce multiple columns into one column based on values

I am trying to unpivot / coalesce multiple columns into one value, based on values in the target columns.
Given the following sample data:
CREATE TABLE SourceData (
Id INT
,Column1Text VARCHAR(10)
,Column1Value INT
,Column2Text VARCHAR(10)
,Column2Value INT
,Column3Text VARCHAR(10)
,Column3Value INT
)
INSERT INTO SourceData
SELECT 1, NULL, NULL, NULL, NULL, NULL, NULL UNION
SELECT 2, 'Text', 1, NULL, NULL, NULL, NULL UNION
SELECT 3, 'Text', 2, 'Text 2', 1, NULL, NULL UNION
SELECT 4, NULL, NULL, NULL, 1, NULL, NULL
I am trying to produce the following result set:
Id ColumnText
----------- ----------
1 NULL
2 Text
3 Text 2
4 NULL
Where ColumnXText column values become one "ColumnText" value per row, based on the following criteria:
If all ColumnX columns are NULL, then ColumnText = NULL
If a ColumnXValue value is "1" and the ColumnXText IS NULL, then
ColumnText = NULL
If a ColumnXValue value is "1" and the ColumnXText IS NOT NULL, then
ColumnText = ColumnXText.
There are no records with more than one ColumnXValue of "1".
What I'd tried is in this SQL Fiddle # http://sqlfiddle.com/#!6/f2e18/2
I'd tried (shown in SQL fiddle):
Unpivioting with CROSS / OUTER APPLY. I fell down on this approach because I was not able to get WHERE conditions to produce the expected results.
I'd also tried using UNPIVOT, but had no luck.
I was thinking of a brute-force approach that did not seem to be correct. The real source table has 44MM rows. I do not control the schema of the source table.
Please let me know if there's a simpler approach than a brute-force tangle of CASE WHENs. Thank you.
I don't think there is much mileage in trying to be too clever with this
SELECT
Id,
CASE
WHEN COLUMN1VALUE = 1 THEN COLUMN1TEXT
WHEN COLUMN2VALUE = 1 THEN COLUMN2TEXT
WHEN COLUMN3VALUE = 1 THEN COLUMN3TEXT
End as ColumnText
From
Sourcedata
I did have them in 321 order, but considered that the right answer might be hit sooner if the checking is done in 123 order instead (fewer checks, if there are 44million rows, might be significant)
Considering you have 44 million rows, you really don't want to experiment to much to join table on itself with apply or something like that. You need just go through it once, and that's best with simple CASE, what you call "brute-force" approach:
SELECT
Id
, CASE WHEN Column1Value = 1 THEN Column1Text
WHEN Column2Value = 1 THEN Column2Text
WHEN Column3Value = 1 THEN Column3Text
END AS ColumnText
FROM SourceData
But, if you really want to get fancy and write something without case, you could use UNION to merge different columns into one, and then join on it:
wITH CTE_Union AS
(
SELECT Id, Column1Text AS ColumnText, Column1Value AS ColumnValue
FROM SourceData
UNION ALL
SELECT Id, Column2Text, Column2Value FROM SourceData
UNION ALL
SELECT Id, Column3Text, Column3Value FROM SourceData
)
SELECT s.Id, u.ColumnText
FROM SourceData s
LEFT JOIN CTE_Union u ON s.Id = u.id and u.ColumnValue = 1
But I guarantee first approach will outperform this by a margin of 4 to 1
If you do not want to use a case expression, then you can use another outer apply() on a common table expression (or subquery/derived table) of your original unpivot with outer apply():
;with cte as (
select s.Id, oa.ColumnText, oa.ColumnValue
from sourcedata s
outer apply (values
(s.Column1Text, s.Column1Value)
, (s.Column2Text, s.Column2Value)
, (s.Column3Text, s.Column3Value)
) oa (ColumnText, ColumnValue)
)
select s.Id, x.ColumnText
from sourcedata s
outer apply (
select top 1 cte.ColumnText
from cte
where cte.Id = s.Id
and cte.ColumnValue = 1
) x
rextester demo: http://rextester.com/TMBR41346
returns:
+----+------------+
| Id | ColumnText |
+----+------------+
| 1 | NULL |
| 2 | Text |
| 3 | Text 2 |
| 4 | NULL |
+----+------------+
This will give you the first non-null text value in order. It seems as if this is what you are trying to accomplish.
select ID, Coalesce(Column3Text,Column2Text,Column1Text) ColumnText
from SourceData

High performance PostgreSQL: Calculate the difference between two sets of key-value tables and store the result

You may consider me a PostgreSQL beginner, and the purpose of this question is to get insights into how to get the best performance out of PostgreSQL for this problem. I have two tables which are identical in their structure but differ in their content.
|Table A|
key - value
1 dave
2 paul
3 michael
|Table B|
key - value
1 dave
2 chris
The problem is simple, to replace table A with table B, but to know which entries were inserted into or removed from table A in the operation.
My first (naive) solution involves doing the work in two stages using table joins to produce the intermediate lists for first the delete and then the insert operations. The results of those queries are stored on the client and are required for correct application function.
SELECT * FROM A LEFT JOIN B ON A.value = B.value WHERE B.value IS NULL;
DELETE FROM A WHERE value IN ("paul", "michael");
SELECT * FROM B LEFT JOIN A ON A.value = B.value WHERE A.value IS NULL;
INSERT INTO A (value) VALUES "chris";
This simple approach does technically work, by the end of the transaction table A will contain the same content as table B, but this strategy quickly becomes quite slow. To give an indication of the size of the tables, it's in the range of millions of rows, so performance at scale is a critical factor, and it would be nice to find a more optimal approach.
In order to address performance requirements, I plan to investigate the following:
Use of HStore back-end for optimal key-value storage performance.
Use of views for pre-calculating intermediate delete/insert queries.
Use of prepared queries to reduce SQL processing overhead.
My question to the experts is can you suggest what you consider to be the optimal strategy. Going slightly beyond the scope of my question, are there any hard and fast rules you can suggest?
Thank you so much for your time. All feedback is very welcome.
This is not perfect, but it works. The thee cases (delete,update,insert) could possibly be combined into a full outer join.
DROP SCHEMA tmp CASCADE;
CREATE SCHEMA tmp ;
SET search_path=tmp;
CREATE TABLE table_a (
zkey INTEGER NOT NULL PRIMARY KEY
, zvalue varchar NOT NULL
, CONSTRAINT a_zvalue_alt UNIQUE (zvalue)
);
INSERT INTO table_a(zkey, zvalue) VALUES
(1, 'dave' )
,(2, 'paul' )
,(3, 'michael' )
;
CREATE TABLE table_b (
zkey INTEGER NOT NULL PRIMARY KEY
, zvalue varchar NOT NULL
, CONSTRAINT b_zvalue_alt UNIQUE (zvalue)
);
INSERT INTO table_b(zkey, zvalue) VALUES
(1, 'dave' )
,(2, 'chris' )
,(5, 'Arnold' )
;
CREATE TABLE table_diff (
zkey INTEGER NOT NULL
, zvalue varchar NOT NULL
, opcode INTEGER NOT NULL DEFAULT 0
);
WITH xx AS (
DELETE FROM table_a aa
WHERE NOT EXISTS (
SELECT * FROM table_b bb
WHERE bb.zkey = aa.zkey
)
RETURNING aa.zkey, aa.zvalue
)
INSERT INTO table_diff(zkey,zvalue,opcode)
SELECT xx.zkey, xx.zvalue, -1
FROM xx
;
SELECT * FROM table_diff;
WITH xx AS (
UPDATE table_a aa
SET zvalue= bb.zvalue
FROM table_b bb
WHERE bb.zkey = aa.zkey
AND bb.zvalue <> aa.zvalue
RETURNING aa.zkey, aa.zvalue
)
INSERT INTO table_diff(zkey,zvalue,opcode)
SELECT xx.zkey, xx.zvalue, 0
FROM xx
;
SELECT * FROM table_diff;
WITH xx AS (
INSERT INTO table_a (zkey, zvalue)
SELECT bb.zkey, bb.zvalue
FROM table_b bb
WHERE NOT EXISTS (
SELECT * FROM table_a aa
WHERE bb.zkey = aa.zkey
AND bb.zvalue = aa.zvalue
)
RETURNING zkey, zvalue
)
INSERT INTO table_diff(zkey,zvalue,opcode)
SELECT xx.zkey, xx.zvalue, 1
FROM xx
;
SELECT * FROM table_a;
SELECT * FROM table_b;
SELECT * FROM table_diff;
Result:
INSERT 0 3
CREATE TABLE
INSERT 0 1
zkey | zvalue | opcode
------+---------+--------
3 | michael | -1
(1 row)
INSERT 0 1
zkey | zvalue | opcode
------+---------+--------
3 | michael | -1
2 | chris | 0
(2 rows)
INSERT 0 1
zkey | zvalue
------+--------
1 | dave
2 | chris
5 | Arnold
(3 rows)
zkey | zvalue
------+--------
1 | dave
2 | chris
5 | Arnold
(3 rows)
zkey | zvalue | opcode
------+---------+--------
3 | michael | -1
2 | chris | 0
5 | Arnold | 1
(3 rows)
BTW: the OQ is very vague about requirements. If the table_diff would be an actual history table, at least a timestamp-column should be added, and zkey and ztimestamp would be a natural choice for a key. Also, the whole process could be wrapped in a set of rules or triggers.
Try using this queries:
DELETE FROM A
WHERE A.value NOT IN (SELECT B.value FROM B);
INSERT INTO A(value)
SELECT B.value
FROM B
WHERE B.value NOT IN (SELECT A.value FROM A)
With indexes on A.value and B.value this queries will be really fast.
If you have value indexed in both tables, and value is unique in each table, this is a case for a full outer join, which should be able to merge the two by walking through the indices:
SELECT CASE WHEN B.value IS NULL THEN
'DELETE FROM A WHERE A.value = ' || quote_literal(A.value)
ELSE
'INSERT INTO A(value) VALUES(' || quote_literal(B.value) || ')'
END
FROM A FULL OUTER JOIN B ON A.value = B.value
WHERE A.value IS DISTINCT FROM B.value
The SQL generation here is really just to demo what the output of the query is.