SQL Get count with most common part of string - sql

I am able to get count of column with most same values, e.g.
SELECT COUNT(*) AS Count, ProjectID
FROM Projects
GROUP BY ProjectID
ORDER BY Count DESC
So now I have table like this,
ProjectID ProjectUrl
1 http://www.CompanyA.com/Projects/123
2 http://www.CompanyB.com/Projects/124
3 http://www.CompanyA.com/Projects/125
4 http://www.CompanyB.com/Projects/126
5 http://www.CompanyA.com/Projects/127
Now Expected result without providing any parameter
ProjectUrl = http://www.CompanyA.com Count = 3
ProjectUrl = http://www.CompanyB.com Count = 2
Edit
Sorry I forgot to mention types of Urls I have in the table, Urls are quiet random though, but there are urls that are common. As we are creating project categories, so project category url can be,
https://spanish.CompanyAa2342.com/portal/projectA/projectTeamA/ProjectPersonA/Task/124
but for some projects there are no project team or so on, so it's bit random :?
I will need to query something more like generic.
What Url will have in common
http://ramdomLanguage.CompanyName.com/portal/RandomName.....

Please try:
select
Col,
COUNT(Col) Cnt
from(
select
SUBSTRING(ProjectUrl, 0, PATINDEX('%.com/%', ProjectUrl)+4) Col
from tbl
)x group by Col
SQL Fiddle Demo

Not sure of performance when dealing with a huge dataset, but this is a solution. I've tried to get a row for each portion of the URLs, delimited by /. Then do a quick aggregate at the end to bring up the counts of each individual part. Fiddle is here: http://www.sqlfiddle.com/#!3/742c4/12 (I've added one row for demo's sake - thanks, TechDo.)
WITH cteFSPositions
AS
(
SELECT ProjectID,
ProjectURL,
1 AS CharPos,
MAX(LEN(ProjectURL)) AS MaxLen,
CHARINDEX('/', ProjectURL) AS FSPos
FROM Projects
GROUP BY ProjectID,
ProjectURL
UNION ALL
SELECT ProjectID,
ProjectURL,
CharPos + 1,
MaxLen,
CHARINDEX('/', ProjectURL, CharPos + 1) AS FSPos
FROM cteFSPositions
WHERE CharPos <= MaxLen
),
cteProjectURLParts
AS
(
SELECT DISTINCT ProjectID,
LEFT(ProjectURL, FSPos) AS ProjectURLPart,
FSPos
FROM cteFSPositions
WHERE FSPos > 0
UNION ALL
SELECT ProjectID,
ProjectURL,
LEN(ProjectURL)
FROM Projects
),
cteFilteredProjectURLParts
AS
(
SELECT ProjectID,
ProjectURLPart
FROM cteProjectURLParts
WHERE ProjectURLPart NOT IN ('http:', 'http:/', 'http://', 'https:', 'https:/', 'https://')
)
SELECT ProjectURLPart,
COUNT(*) AS Instances
FROM cteFilteredProjectURLParts
GROUP BY ProjectURLPart
ORDER BY Instances DESC,
ProjectURLPart;
This produces (with the additional row I added in):
ProjectURLPart Instances
http://www.CompanyA.com/ 4
http://www.CompanyA.com/Projects/ 3
http://www.CompanyB.com/ 2
http://www.CompanyB.com/Projects/ 2
http://www.CompanyA.com/BlahblahBlah/ 1
http://www.CompanyA.com/BlahblahBlah/More1/ 1
http://www.CompanyA.com/BlahblahBlah/More1/More2 1
http://www.CompanyA.com/Projects/123 1
http://www.CompanyA.com/Projects/125 1
http://www.CompanyA.com/Projects/127 1
http://www.CompanyB.com/Projects/124 1
http://www.CompanyB.com/Projects/126 1
EDIT: Oops, original post had code of fiddle in progress. Have supplied the finalized code and updated fiddle link.
EDIT 2: Realized I was cutting off the end part of the URLS due to the way I was cutting the URLs up. For completeness' sake, I've added them back in to the final dataset. Updated fiddle as well.

Related

Change duplicate value in a column

Can you please tell me what SQL query can I use to change duplicates in one column of my table?
I found these duplicates:
SELECT Model, count(*) FROM Devices GROUP BY model HAVING count(*) > 1;
I was looking for information on exactly how to change one of the duplicate values, but unfortunately I did not find a specific option for myself, and all the more information is all in abundance filled by deleting the duplicate value line, which I don't need. Not strong in SQL at all. I ask for help. Thank you so much.
You can easily use a Window Functions such as ROW_NUMBER() with partitioning option in order to group by Model column to eliminate the duplicates, and then pick the first rows(rn=1) returning from the subquery such as
WITH d AS
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY Model) AS rn
FROM Devices
)
SELECT ID, Model -- , and the other columns
FROM d
WHERE rn = 1
Demo
use exists as follows:
update d
set Model = '-'
from Devices d
where exists (select 1 from device dd where dd.model = d.model and dd.id > d.id)
After the command:
SELECT Model, count (*) FROM Devices GROUP BY model HAVING count (*)> 1;
i get the result:
1895 lines = NULL;
3383 lines with duplicate values;
and all these values are 1243.
after applying your command:
update Devices set
Model = '-'
where id not in
(select
min(Devices .id)
from Devices
group by Devices.Model)
i got 4035 lines changed.
if you count, it turns out, (3383 + 1895) = 5278 - 1243 = 4035
and it seems like everything fits together, the result suits, it works.

Completely Unique Rows and Columns in SQL

I want to randomly pick 4 rows which are distinct and do not have any entry that matches with any of the 4 chosen columns.
Here is what I coded:
SELECT DISTINCT en,dialect,fr FROM words ORDER BY RANDOM() LIMIT 4
Here is some data:
**en** **dialect** **fr**
number SFA numero
number TRI numero
hotel CAI hotel
hotel SFA hotel
I want:
**en** **dialect** **fr**
number SFA numero
hotel CAI hotel
Some retrieved rows would have something similar with each other, like having the same en or the same fr, I would like to retrieved rows that do not share anything similar with each other, how do I do that?
I think I’d do this in the front end code rather the dB, here’s a pseudo code (don’t know what your node looks like):
var seenEn = “en not in (''“;
var seenFr = “fr not in (''“;
var rows =[];
while(rows.length < 4)
{
var newrow = sqlquery(“SELECT *
FROM table WHERE “ + seenEn + “) and ”
+ seenFr + “) ORDER BY random() LIMIT 1”);
if(!newrow)
break;
rows.push(newrow);
seenEn += “,‘“+ newrow.en + “‘“;
seenFr += “,‘“+ newrow.fr + “‘“;
}
The loop runs as many times as needed to retrieve 4 rows (or maybe make it a for loop that runs 4 times) unless the query returns null. Each time the query returns the values are added to a list of values we don’t want the query to return again. That list had to start out with some values (null) that are never in the data, to prevent a syntax error when concatenation a comma-value string onto the seenXX variable. Those syntax errors can be avoided in other ways like having a Boolean of “if it’s the first value don’t put the comma” but I chose to put dummy ineffective values into the sql to make the JS simpler. Same goes for the
As noted, it looks like JS to ease your understanding but this should be treated as pseudo code outlining a general algorithm - it’s never been compiled/run/tested and may have syntax errors or not at all work as JS if pasted into your file; take the idea and work it into your solution
Please note this was posted from an iphone and it may have done something stupid with all the apostrophes and quotes (turned them into the curly kind preferred by writers rather than the straight kind used by programmers)
You can use Rank or find first row for each group to achieve your result,
Check below , I hope this code will help you
SELECT 'number' AS Col1, 'SFA' AS Col2, 'numero' AS Col3 INTO #tbl
UNION ALL
SELECT 'number','TRI','numero'
UNION ALL
SELECT 'hotel','CAI' ,'hotel'
UNION ALL
SELECT 'hotel','SFA','hotel'
UNION ALL
SELECT 'Location','LocationA' ,'Location data'
UNION ALL
SELECT 'Location','LocationB','Location data'
;
WITH summary AS (
SELECT Col1,Col2,Col3,
ROW_NUMBER() OVER(PARTITION BY p.Col1 ORDER BY p.Col2 DESC) AS rk
FROM #tbl p)
SELECT s.Col1,s.Col2,s.Col3
FROM summary s
WHERE s.rk = 1
DROP TABLE #tbl

Recursive cte sql with for hierarchy level

I have a little problem with this recursive CTE, it works fine except when I have a user without root readable rights means no entry for this element. So if I run this query on a user with rights just on the leaves inside the tree the level part of this query won't work correctly.
It will show the real level hierarchy for example 6 but its the top first readable element for him so it should be 1.
WITH Tree
AS (
SELECT
id,
parent,
0 AS Level,
id AS Root,
CAST(id AS VARCHAR(MAX)) AS Sort,
user_id
FROM SourceTable
WHERE parent IS NULL
UNION ALL
SELECT
st.id,
st.parent,
Level + 1 AS Level,
st.parent AS Root,
uh.sort + '/' + CAST(st.id AS VARCHAR(20)) AS Sort,
st.user_id
FROM SourceTable AS st
JOIN Tree uh ON uh.id = st.parent
)
SELECT * FROM Tree AS t
JOIN UserTable AS ut ON ut.id = t.user_id AND ut.user_id = '141F-4BC6-8934'
ORDER BY Sort
the level is as follows
id level
5 0
2 1
7 2
4 2
1 2
6 1
3 2
8 2
9 3
When a user now just have read rights to id 8 and 9 the level from CTE stays at 2 for id 8 and 3 for id 9 but I need for id 8 level 1 if there is no one before
You haven't told us how you know whether a user has rights to a given id. That is a necessary piece of information. I'm going to put some code below that assumes you add a column to your query called hasRights and that this column will have a zero value if the user does not have rights and a value of one if they do. You may need to tweak this, since I have no data to test with but hopefully it will get you close.
Basically, the query is altered to only add 1 to the level if the user has rights. It also only adds to the sort path if the user has rights, otherwise an empty string is appended. So, if ids 8 and 9 are the only items the user has access to, you should see levels of 1 and 2 and sort paths similar to '5/8/9' rather than '5/6/8/9'. If you still aren't able to get it working, it would help us tremendously if you posted a sample schema on SqlFiddle.
WITH Tree
AS (
SELECT
id,
parent,
0 AS Level,
id AS Root,
hasRights AS HasRights,
CAST(id AS VARCHAR(MAX)) AS Sort,
user_id
FROM SourceTable
WHERE parent IS NULL
UNION ALL
SELECT
st.id,
st.parent,
Level + st.hasRights AS Level,
st.parent AS Root,
st.hasRights AS HasRights,
uh.sort + CASE st.hasRights WHEN 0 THEN '' ELSE '/' + CAST(st.id AS VARCHAR(20)) END AS Sort,
st.user_id
FROM SourceTable AS st
JOIN Tree uh ON uh.id = st.parent
)
SELECT * FROM Tree AS t
JOIN UserTable AS ut ON ut.id = t.user_id AND ut.user_id = '141F-4BC6-8934'
ORDER BY Sort
You requite something like if the higher level(0 or 1 ) is not in existence, then the next level become the higher level..
If yes, then you have to do this when the final result
insert all the results in temp table lets say #info (with same characteristics of data)
Now after all final data ready in the table,
Please check from the top.
Select * from #info where level= 0
if this returns 0 rows then you have to update each records level. to (level = level -1)
Now again same for Level=0, then level 1, then level 2 , then level 3 in recursion. this will be easy but not easy to code. So try without recursion then try final update.
I hope this will help :)
Please reply if you are looking for something else.
Try to perform the following select and let me know if it is your desired result:
SELECT *,
DENSE_RANK() OVER (PARTITION BY t.user_id ORDER BY t.LEVEL ASC) -1 as RelativeUserLevel
FROM Tree AS t
JOIN UserTable AS ut ON ut.id = t.user_id AND ut.user_id = '141F-4BC6-8934'
ORDER BY Sort
Is conversion of your table into hierarchical types an option:
hierarchyid data type (http://technet.microsoft.com/en-us/library/bb677290(v=sql.105).aspx)
or
xml data type
SOME TIMES option (maxrecursion 10000); is very useful
I don't have time to read your problem but here snipped
declare cursorSplit Cursor for
select String from dbo.SplitN(#OpenText,'~')
where String not in (SELECT [tagCloudStopWordText]
FROM [tagCloudStopList] where [langID]=#_langID)
option (maxrecursion 10000);
open cursorSplit
I'm sorry to be a party pooper and spoil the fun of creating such an interesting piece of SQL, but perhaps you should load all relevant access data into your application and determine the user levels in the application?
I bet it would result in more maintainable code..

T-SQL Sum Values of Like Rows

I currently use this select statement in SSRS to report Recent Demand and Days of Inventory to end users.
select Issue.MATERIAL_NUMBER,
SUM(Issue.SHIPPED_QTY)AS DEMAND_QTY,
Main.QUANTITY_TOTAL_STOCK / SUM(Issue.SHIPPED_QTY) * 122 AS [DOI]
From AGS_DATAMART.dbo.GOODS_ISSUE AS Issue
join AGS_DATAMART.dbo.OPR_MATERIAL_DIM AS MAT on MAT.MATERIAL_NUMBER = Issue.MATERIAL_NUMBER
join AGS_DATAMART.dbo.SCE_ECC_MAIN_FINAL_INV_FACT AS MAIN on MAT.MATERIAL_SID = MAIN.MATERIAL_SID
join AGS_DATAMART.dbo.SCE_PLANT_DIM AS PLANT on PLANT.PLANT_SID = MAIN.PLANT_SID
Where Issue.SHIP_TO_CUSTOMER_ID = #CUSTID
and Issue.ACTUAL_PGI_DATE > GETDATE() - 122
and PLANT.PLANT_CODE = #CUSTPLANT
and MAIN.STORAGE_LOCATION = '0001'
Group by Issue.MATERIAL_NUMBER,Main.QUANTITY_TOTAL_STOCK
Pretty Simple.
But is has come to my attention, that they have similar Material Numbers whos values need to be combined.
Material | Qty
0242-55161W 1
0242-55161 3
The two Material Numbers above should be combined and reported as 0242-55161 Qty 4.
How do I combine rows like this? This is just 1 of many queries that will need to be adjusted. Is it possible?
EDIT - The similar material will always be the base number plus the "W", if that matters.
Please note I am brand new to SQL and SSRS, and this is my first time posting here.
Let me know if I need to include any other details.
Thanks in advance.
Answer;
Using just replace, it kept returning 2 unique lines even when using SUM.
I was able to get the desired result using the following. Can you see anything wrong with this method?
with Issue_Con AS
(
select replace(Issue.MATERIAL_NUMBER,'W','') As [MATERIAL_NUMBER],
Issue.SHIPPED_QTY AS [SHIPPED_QTY]
From AGS_DATAMART.dbo.GOODS_ISSUE AS Issue
Where Issue.SHIP_TO_CUSTOMER_ID = #CUSTSHIP
and Issue.SALES_ORDER_TYPE_CODE = 'ZTPC'
and Issue.ACTUAL_PGI_DATE > GETDATE() - 122
)
select Issue_Con.MATERIAL_NUMBER,
SUM(Issue_Con.SHIPPED_QTY)AS [DEMAND_QTY],
Main_Con.QUANTITY_TOTAL_STOCK / SUM(Issue_Con.SHIPPED_QTY) * 122 AS [DOI]
From Issue_Con
join Main_Con on Main_Con.MATERIAL_Number = Issue_Con.MATERIAL_Number
Group By Issue_Con.MATERIAL_NUMBER, Main_Con.QUANTITY_TOTAL_STOCK;
You need to replace Issue.MATERIAL_NUMBER in the select and group by with something else. What that something else is depends on your data.
If it's always 10 digits with anything afterwards ignored, then you can use substr(Issue.MATERIAL_NUMBER, 1, 10)
If the extraneous character is always W and there are no Ws in the proper number, then you can use replace(Issue.MATERIAL_NUMBER, 'W', '')
If it's anything from the first alphabetic character, then you can use case when patindex('%[A-Za-z]%', Issue.MATERIAL_NUMBER) = 0 then Issue.MATERIAL_NUMBER else substr(Issue.MATERIAL_NUMBER, 1, patindex('%[A-Za-z]%', Issue.MATERIAL_NUMBER)) end
You could group your data by this expression instead of MATERIAL_NUMBER:
CASE SUBSTRING(MATERIAL_NUMBER, LEN(MATERIAL_NUMBER), 1)
WHEN 'W' THEN LEFT(MATERIAL_NUMBER, LEN(MATERIAL_NUMBER) - 1)
ELSE MATERIAL_NUMBER
END
That is, check if the last character is W. If it is, return all but the last character, otherwise return the entire value.
To avoid repeating the same expression twice (once in GROUP BY and once in SELECT) you could use a subselect, for example like this:
select Issue.MATERIAL_NUMBER_GROUP,
SUM(Issue.SHIPPED_QTY)AS DEMAND_QTY,
Main.QUANTITY_TOTAL_STOCK / SUM(Issue.SHIPPED_QTY) * 122 AS [DOI]
From (
SELECT
*,
CASE SUBSTRING(MATERIAL_NUMBER, LEN(MATERIAL_NUMBER), 1)
WHEN 'W' THEN LEFT(MATERIAL_NUMBER, LEN(MATERIAL_NUMBER) - 1)
ELSE MATERIAL_NUMBER
END AS MATERIAL_NUMBER_GROUP
FROM AGS_DATAMART.dbo.GOODS_ISSUE
) AS Issue
join AGS_DATAMART.dbo.OPR_MATERIAL_DIM AS MAT on MAT.MATERIAL_NUMBER = Issue.MATERIAL_NUMBER
join AGS_DATAMART.dbo.SCE_ECC_MAIN_FINAL_INV_FACT AS MAIN on MAT.MATERIAL_SID = MAIN.MATERIAL_SID
join AGS_DATAMART.dbo.SCE_PLANT_DIM AS PLANT on PLANT.PLANT_SID = MAIN.PLANT_SID
Where Issue.SHIP_TO_CUSTOMER_ID = #CUSTID
and Issue.ACTUAL_PGI_DATE > GETDATE() - 122
and PLANT.PLANT_CODE = #CUSTPLANT
and MAIN.STORAGE_LOCATION = '0001'
Group by Issue.MATERIAL_NUMBER_GROUP,Main.QUANTITY_TOTAL_STOCK

SQL to query by date dependencies

I have a table of patients which has the following columns: patient_id, obs_id, obs_date. Obs_id is the ID of a clinical observation (such as weight reading, blood pressure reading....etc), and obs_date is when that observation was taken. Each patient could have several readings on different dates...etc. Currently I have a query to get all patients that had obs_id = 1 and insert them into a temporary table (has two columns, patient_id, and flag which I set to 0 here):
insert into temp_table (select patient_id, 0 from patients_table
where obs_id = 1 group by patient_id having count(*) >= 1)
I also execute an update statement to set the flag to 1 for all patients that also had obs_id = 5:
UPDATE temp_table SET flag = 1 WHERE EXISTS (
SELECT patient_id FROM patients_table WHERE obs_id = 5 group by patient_id having count(*) >=1
) v WHERE temp_table.patient_id = v.patient_id
Here's my question: How do I modify both queries (without combining them or removing the group by statement) such that I can answer the following question:
"get all patients who had obs_id = 5 after obs_id = 1". If I add a min(obs_date) or max(obs_date) to the select of each query and then add "AND v.obs_date > temp_table.obs_date" to the second one, is that correct??
The reason why I need not remove the group by statement or combine is because these queries are generated by a code generator (from a web app), and i'd like to do that modification without messing up the code generator or re-writing it.
Many thanks in advance,
The advantage of SQL is that it works with sets. You don't need to create temporary tables or get all procedural.
As you describe the problem (find all patients who have obs_id 5 after obs_id 1), I'd start with something like this
select distinct p1.patient_id
from patients_table p1, patients_table p2
where
p1.obs_id = 1 and
p2.obs_id = 5 and
p2.patient_id = p1.patient_id and
p2.obs_date > p1.obs_date
Of course, that doesn't help you deal with your code generator. Sometimes, tools that make things easier can also get in the way.