Monday, March 19, 2012

Data Parse Challenge

Thanks in Advance,
I would like to parse a column of ntext that looks something like this:
'I was born on Sept. 14, 1960. I graduated college on August 2, 1982,
and went to France on Jul 17, 1985. I died on December 30, 2001.'
I would like to break it into units that each end with the date and
look like this:
I was born on Sept. 14, 1960
. I graduated college on August 2, 1982
, and went to France on Jul 17, 1985
. I died on December 30, 2001
The months will not be formatted consistantly, with the most common
forms above . The dates in each row will vary. Does anyone have a
suggestion how to approach this?
Mark
Napa, CACreate a temp list of years (or even put them in a table),
Do a loop through the statement and every time one of the years correlates
with the a year from the list insert a line break
Jack Vamvas
___________________________________
Receive free SQL tips - www.ciquery.com/sqlserver.htm
<xxxdbaxxx@.gmail.com> wrote in message
news:1145607044.910151.144310@.t31g2000cwb.googlegroups.com...
> Thanks in Advance,
> I would like to parse a column of ntext that looks something like this:
> 'I was born on Sept. 14, 1960. I graduated college on August 2, 1982,
> and went to France on Jul 17, 1985. I died on December 30, 2001.'
> I would like to break it into units that each end with the date and
> look like this:
> I was born on Sept. 14, 1960
> . I graduated college on August 2, 1982
> , and went to France on Jul 17, 1985
> . I died on December 30, 2001
> The months will not be formatted consistantly, with the most common
> forms above . The dates in each row will vary. Does anyone have a
> suggestion how to approach this?
> Mark
> Napa, CA
>|||Thanks, Jack. How do I then break out the portions with the line
breaks?|||Try this:
CREATE TABLE #control ( control_no INT PRIMARY KEY )
DECLARE @.i INT
SET NOCOUNT ON
SET @.i = 1900
-- Add control numbers to temp table
WHILE @.i Between 1900 And 2050
BEGIN
INSERT #control VALUES( @.i )
SET @.i = @.i + 1
END
SET NOCOUNT OFF
DROP TABLE #tmp
CREATE TABLE #tmp ( diary_id INT IDENTITY, diary_text VARCHAR(500) )
SET NOCOUNT ON
INSERT INTO #tmp VALUES ( 'I was born on Sept. 14, 1960. I graduated
college on August 2, 1982, and went to France on Jul 17, 1985. I died on
December 30, 2001.' )
INSERT INTO #tmp VALUES ( 'I was born on Jan 13 1975. I graduated
university on June 1, 1996, and went to Spain on September 3, 1997. I died
on ' + CONVERT( VARCHAR, GETDATE(), 106 ) )
SET NOCOUNT OFF
DROP TABLE #positions
CREATE TABLE #positions ( pos_id INT IDENTITY, diary_id INT, start INT,
finish INT )
-- Insert the finish position
INSERT INTO #positions ( diary_id, finish )
SELECT
t.diary_id,
PATINDEX( '%' + CAST( c.control_no AS VARCHAR ) + '%', diary_text ) + 4
AS finish
FROM #tmp t, #control c
WHERE PATINDEX( '%' + CAST( c.control_no AS VARCHAR ) + '%', diary_text ) >
0
-- Update the start position
UPDATE p1
SET p1.start = ISNULL( p2.finish + 1, 1 )
FROM #positions p1
LEFT JOIN #positions p2 ON p1.diary_id = p2.diary_id
AND p2.pos_id = p1.pos_id - 1
-- SELECT * FROM #positions
skip:
SELECT t.diary_id, SUBSTRING( diary_text, p.start, ((p.finish - p.start )+
1) )
FROM #tmp t
INNER JOIN #positions p ON t.diary_id = p.diary_id
-- OR
SELECT t.diary_id, RTRIM( LTRIM( SUBSTRING( diary_text, p.start, ((p.finish
- p.start )) ) ) )
FROM #tmp t
INNER JOIN #positions p ON t.diary_id = p.diary_id
Let me know how you get on.
Damien
"xxxdbaxxx@.gmail.com" wrote:

> Thanks in Advance,
> I would like to parse a column of ntext that looks something like this:
> 'I was born on Sept. 14, 1960. I graduated college on August 2, 1982,
> and went to France on Jul 17, 1985. I died on December 30, 2001.'
> I would like to break it into units that each end with the date and
> look like this:
> I was born on Sept. 14, 1960
> .. I graduated college on August 2, 1982
> , and went to France on Jul 17, 1985
> .. I died on December 30, 2001
> The months will not be formatted consistantly, with the most common
> forms above . The dates in each row will vary. Does anyone have a
> suggestion how to approach this?
> Mark
> Napa, CA
>|||I don't have a solution using TSQL.
But from the information you have given, I will assume that the data is
already in a database column.
I suggest that you run a query and get all the data, and work on parsing the
text at the client, -NOT- the server. You can then use a Programming
language better designed for parsing text, TSQL is not really designed for
this, and you might anger the SQL Saints in doing so.
Additional Info: If you need to do the parsing on the Server:
If you have SQL Server 2005 installed and you know a .NET language you could
use the base class library to help you out with this problem. Parsing dates
using a .NET base class library would be much easier and faster.
You could then return the formatted text or whatever using a CLR stored
Procedure.
Russell Mangel
Las Vegas, NV
<xxxdbaxxx@.gmail.com> wrote in message
news:1145607044.910151.144310@.t31g2000cwb.googlegroups.com...
> Thanks in Advance,
> I would like to parse a column of ntext that looks something like this:
> 'I was born on Sept. 14, 1960. I graduated college on August 2, 1982,
> and went to France on Jul 17, 1985. I died on December 30, 2001.'
> I would like to break it into units that each end with the date and
> look like this:
> I was born on Sept. 14, 1960
> . I graduated college on August 2, 1982
> , and went to France on Jul 17, 1985
> . I died on December 30, 2001
> The months will not be formatted consistantly, with the most common
> forms above . The dates in each row will vary. Does anyone have a
> suggestion how to approach this?
> Mark
> Napa, CA
>|||use CHAR(13)+CHAR(10)
In other words as your parsing through the string , check the last 4
characters , if they match with one of the years you have in the list add
CHAR(13)+CHAR(10) and this will create the effect your looking for
--
Jack Vamvas
___________________________________
Receive free SQL tips - www.ciquery.com/sqlserver.htm
<xxxdbaxxx@.gmail.com> wrote in message
news:1145607598.389905.80650@.v46g2000cwv.googlegroups.com...
> Thanks, Jack. How do I then break out the portions with the line
> breaks?
>|||Sorry forgot to post the test code
DECLARE @.var1 NVARCHAR(1000)
SET @.var1 = (select CONVERT(NVARCHAR(1000),test1) from recoveryTest where
testId = 2)
DECLARE @.var1Length INT,@.counter INT,@.statement NVARCHAR(1000),@.tempSub
NVARCHAR(1)
SET @.var1Length = LEN(@.var1)
SET @.counter = 1
SET @.statement = ''
WHILE @.counter <= @.var1Length
BEGIN
--print @.counter
SET @.tempSub = RTRIM(SUBSTRING(@.var1,@.counter,@.counter)
)
SET @.statement = @.statement + @.tempSub
IF RTRIM(SUBSTRING(@.statement,@.counter-3,@.counter)) IN
('1980','2001','2003') set @.statement =@.statement + char(10)
SET @.counter = @.counter + 1
END
PRINT @.statement
Jack Vamvas
___________________________________
Receive free SQL tips - www.ciquery.com/sqlserver.htm
<xxxdbaxxx@.gmail.com> wrote in message
news:1145607598.389905.80650@.v46g2000cwv.googlegroups.com...
> Thanks, Jack. How do I then break out the portions with the line
> breaks?
>|||Thanks Damien. This works, I had to modify to accept values from a
table of course. One thing I am working on however is if the same year
occurs twice in the same row, then it fails to parse the second year.
Any thoughts on that?|||Well, this works but it starting to look a bit messy. I've changed to
CHARINDEX as it has a start_location optional argument.
DROP TABLE #control
CREATE TABLE #control ( control_no INT PRIMARY KEY )
DECLARE @.i INT
SET NOCOUNT ON
SET @.i = 1900
-- Add control numbers to temp table
WHILE @.i Between 1900 And 2050
BEGIN
INSERT #control VALUES( @.i )
SET @.i = @.i + 1
END
SET NOCOUNT OFF
DROP TABLE #tmp
CREATE TABLE #tmp ( diary_id INT IDENTITY, diary_text VARCHAR(500) )
SET NOCOUNT ON
INSERT INTO #tmp VALUES ( 'I was born on Sept. 14, 1960. I graduated
college on August 2, 1982, and went to France on Jul 17, 1985. I died on
December 30, 2001.' )
INSERT INTO #tmp VALUES ( 'I was born on Jan 13 1975. I graduated
university on June 1, 1996, and went to Spain on September 3, 1997. I died
on ' + CONVERT( VARCHAR, GETDATE(), 106 ) + ' and went to heaven on ' +
CONVERT( VARCHAR, GETDATE(), 106 ) + '.' )
SET NOCOUNT OFF
DROP TABLE #positions
CREATE TABLE #positions ( pos_id INT IDENTITY, diary_id INT, start INT,
finish INT )
-- Insert the finish position
INSERT INTO #positions ( diary_id, finish )
SELECT
t.diary_id,
CHARINDEX( CAST( c.control_no AS VARCHAR ), diary_text ) + 4
AS finish
FROM #tmp t, #control c
WHERE CHARINDEX( CAST( c.control_no AS VARCHAR ), diary_text ) > 0
UNION
SELECT
t.diary_id,
CHARINDEX( CAST( c.control_no AS VARCHAR ), diary_text, ( CHARINDEX(
CAST( c.control_no AS VARCHAR ), diary_text ) ) + 1 ) + 4
AS finish
FROM #tmp t, #control c
WHERE CHARINDEX( CAST( c.control_no AS VARCHAR ), diary_text, ( CHARINDEX(
CAST( c.control_no AS VARCHAR ), diary_text ) ) + 1 ) > 0
-- Update the start position
UPDATE p1
SET p1.start = ISNULL( p2.finish + 1, 1 )
FROM #positions p1
LEFT JOIN #positions p2 ON p1.diary_id = p2.diary_id
AND p2.pos_id = p1.pos_id - 1
-- SELECT * FROM #positions
skip:
SELECT t.diary_id, SUBSTRING( diary_text, p.start, ((p.finish - p.start )+
1) )
FROM #tmp t
INNER JOIN #positions p ON t.diary_id = p.diary_id
-- OR
SELECT t.diary_id, RTRIM( LTRIM( SUBSTRING( diary_text, p.start, ((p.finish
- p.start )) ) ) )
FROM #tmp t
INNER JOIN #positions p ON t.diary_id = p.diary_id
Tell me this isn't your homework! Let me know how you get on.
Damien
"xxxdbaxxx@.gmail.com" wrote:

> Thanks Damien. This works, I had to modify to accept values from a
> table of course. One thing I am working on however is if the same year
> occurs twice in the same row, then it fails to parse the second year.
> Any thoughts on that?
>|||Damien (Damien@.discussions.microsoft.com) writes:
> Well, this works but it starting to look a bit messy. I've changed to
> CHARINDEX as it has a start_location optional argument.
Alas, there is a problem with this, since the data was an ntext column.
charindex does not work on ntext beyond the 4000th character.
I would suggest that this problem is best handled client-side, at least
in SQL 2000.
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se
Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pr...oads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodin...ions/books.mspx

No comments:

Post a Comment