Ipswitch WS_FTP ftp logs export

Hi everybody, hope you enjoy reading our blog! This time I will write about Ipswitch WS_FTP ftp server and how did we extract logs from it.

One of our customer has very old ws_ftp setup (ver 7.1) and they decided to perform migration to another FTP server solution. As I told before, it is very old setup with 200+ users, who are not documented. Ftp is extensively used, so customer decided to analyze logs for last year and maybe clean some of the users and inform others about future changes in FTP setup. There are 2 ways to get a logs:

  • From PostgreSQL database
  • From log viewer built into WS_FTP

Unfortunately, the password from PostgreSQL was unknown so the only possible way for us was to use log viewer. The only export format for log viewer is xml. It is not user-friendly and does not allow regular user to make any filtering. In addition exported xml was about 4.5Gb. So not possible to open with excel. Ms Access with some stupid errors, even when I split file to parts

Untitled_0

So I decided to export data to MS SQL Server table and convert it to flat format from xml. Here is an example on how xml looked

<?xml version=”1.0″ encoding=”utf-8″ ?>
<log>
<entry>
<log_time>20150311-12:45:02</log_time>
<description><![CDATA[Timeout session:  605,109375 secs inactivity, Login state]]></description>
<service>S1</service>
<sessionid>16278752</sessionid>
<type>0</type>    <severity>1</severity>
<user>user1</user>
<host>ftps.srv.com</host>
<lstnconnaddr>12.28.27.13:990</lstnconnaddr>
<cliconnaddr>39.16.77.36:36738</cliconnaddr>
<cmd>timeout</cmd>
<params><![CDATA[MLSD ]]>
<sguid>B4CFC212-7C36-438F-EEA7-188BD3EACAA3</sguid>
</entry>
</log>

Therefore, I made an empty database and a table inside


create table ftplog (

[log_time] nvarchar (17),

[description] nvarchar (100),

[user] nvarchar (30),

[host] nvarchar (20),

[lstnconnaddr] nvarchar (30),

[cliconnaddr] nvarchar (30),

[cmd] nvarchar (30))

GO


The length of the fields can be different, especially for “description”, however I used max values to cover every possible case

I also made a query to import a data from xml file to table so it is written as table, but not xml


use FTPdatabase

GO

DECLARE @messagebody XML

 

SELECT @messagebody = BulkColumn

FROM OPENROWSET(BULK ‘C:\year.xml’, SINGLE_CLOB) AS X –change to your path

 

INSERT INTO [ftplog]

select  a.value(N'(./log_time)[1]’, N’nvarchar (17)’) as [log_time],

a.value(N'(./description[1])’, N’nvarchar (30)’)) as [description],

a.value(N'(./user[1])’, N’nvarchar (30)’) as [user],

a.value(N'(./host[1])’, N’nvarchar (30)’) as [host],

a.value(N'(./lstnconnaddr[1])’, N’nvarchar(30)’) as [lstnconnaddr],

a.value(N'(./cliconnaddr[1])’, N’nvarchar (30)’) as [cliconnaddr],

a.value(N'(./cmd[1])’, N’nvarchar (30)’) as [command]

from @messagebody.nodes(‘/log/entry’) as r(a);


This query is designed for this specific xml structure, so for different xml it need to be changed. As you can see I skip some unnecessary fields from xml, like sessionid, host, sguid and so on.

After the first run, I got an error message that it cannot read some symbols, sure, it cannot! FTP was installed on Swedish, and I had different settings on my laptop, so I had to change language for non Unicode programs to be Swedish and restart laptop. By the way, before laptop I was trying to import data to SQL Server installed on virtual machine, which is used quite a lot by different people. Query was running for years, so I decided to use my laptop with SSD and 12gigs of RAM.

Untitled_1

On the small files import query was running fine, but when I started it against xml with data for the last year it gave me an error message

Msg 9420, Level 16, State 1, Line 5

XML parsing: line 42131402, character 49, illegal xml character

 

It is not possible to view 4Gb file in notepad. First I tried to use powershell, but it took whole memory and was running for years. Therefore I made a vbs script to read a line with specific number


Set objFSO=CreateObject(“Scripting.FileSystemObject”)

‘ Put your path to xml here

filename = “C:\Desktop\FTP\year.xml”

 

Set fso = CreateObject(“Scripting.FileSystemObject”)

Set f = fso.OpenTextFile(filename)

 

for k = 1 to 42131400   ‘ In my case error happened on line 42131400 and I was not sure how vbs count the ‘strings so I did output for 2 strings prior to 42131402

strline = f.ReadLine

Next

 

wscript.echo “String with number: ” & k & “has value” & “strline”

strline = f.ReadLine

wscript.echo “String with number: ” & k+1 & “has value” & “strline”

strline = f.ReadLine

wscript.echo “String with number: ” & k+2 & “has value” & “strline”

 

f.Close


Unfortunately, even with this tool I was not able to identify the problem. I was able to read a string, but not to identify what is wrong with it. Here is what I got reading the string. Yeah, there are some Unicode symbols, but anyway it should be possible to read them. Here is example of string output from script:

<description><![CDATA[Unknown command : ŸÔŸÝ]]></description>

After googling a bit I found out that it might be a problem with unreadable Unicode table characters (1-32). So I made another vbs script to remove such characters from my source file.


Set objFSO=CreateObject(“Scripting.FileSystemObject”)

‘ How to write file

outFile=”C:\FTP\fixed1.xml”   ‘ Keep results in this file

Set objFile = objFSO.OpenTextFile(outFile,2)

filename = “C:\Desktop\FTP\year.xml” ‘ Source file

 

Set fso = CreateObject(“Scripting.FileSystemObject”)

Set f = fso.OpenTextFile(filename)

i = 0

Do Until f.AtEndOfStream

 

strline = f.ReadLine ‘ Read each string of the source file

for j=1 to 32              ‘ Change 32 to be 31, so you do not loose spaces

strline=Replace(strline,Chr(j),””) ‘Remove unreadable characters

Next

objFile.WriteLine strLine  ‘ Write fixed string to output file

i = i+1

Loop

wscript.echo “Number of strings is: ” & i

f.Close

objFile.Close


After cleaning a file I got error message from SQL

Msg 9421, Level 16, State 1, Line 5

XML parsing: line 1, character 13, illegal name character

 

Using the first vbs script I red first 3 lines and they looked exactly fine

<?xmlversion=”1.0″encoding=”utf-8″?>

<log>

<entry>

However, after bit more research I discovered that the first string is missing spaces (unreadable character with code 32 that was removed by vbs script. I was not worried about spaces anywhere so just added them to the first string

<?xml version=”1.0″ encoding=”utf-8″ ?>

 

I tried running SQL import batch again and it still was not working with error message

Msg 9420, Level 16, State 1, Line 5

XML parsing: line 1461998912, character 29, illegal xml character

Did not feel comfortable with reading strings with vbs script and spending so much time on that, so I did some research and it appeared that there are quite a lot of nice tools who can read huge text files.

One of them is EmEditor was fit for my needs with fully functional 30 days trial. So now I was able to open and browse huge xml file using nice GUI interface!!!

During file opening got an error message that some characters cannot be converted using the specified encoding.

Untitled_3

Wow, this tool saved hours for me and I was able to identify problematic symbol, it appeared to be �

Using EmEditor I replaced this symbol with nothing and saved changes

 

Finally, I was able to run SQL import query and after 30 minutes I got a table containing 6+ million records ready for queries and sorting!

For customer I exported same xml file to MS Access database table. Compressed it consumed just 60Mb

Hope you enjoyed this long story. I do not think that someone will need to reproduce all of my steps, but some of them might be useful for different purpose. Looking forward for your feedbacks and do not forget to follow our blog. Promise, soon it will update more often and it will be a lot of interesting content!