The submitted audit's XML have the sections occur in the following order
... other items.
software
services
keys
routes
I am seeing a lot of occurences of a submitted audit (by the domain audit or list audit scripts) not completing. It usually fails in the software or services section. As a result, the next time an audit for that system is submitted and completes, OAv2 assumes "oh, none of these services, keys or routes exist - they must all be new".
This happens because when an audit does not complete, after initially getting to (say) the software section, the following sections do not get processed and the individual rows in the relevent tables (services, keys & routes) do not get their timestamps updated.
So - I'm seeing a fair bit of this.
I've tried throwing extra resources at the OAv2 server - it doesn't seem to make a difference.
I have altered the DB slightly to add an extra field in the sys_man_audits table. Now, as each section of the XML audit is processed, an update is posted to this field. So, when (say) the software section of the XML is about to be processed, the row for the audit in sys_man_audits has it's audit_debug field updated to simply show "software". The final update to this row, upon completion of the audit is to simply remove the contents of that field. The result is that I can check the table for any rows that contain data in the audit_debug field. If any do, then the audit has not finished, and the last section processed is noted in the field.
So...
I tried it this morning on and audit_list.vbs run with ~50 systems in the list, processing 8 at a time. It was sending info to an OAv2 server on my local machine (a desktop Core2Duo @ 2.33GHz with 2GB memory and a normal SAT drive). Out of the ~50 systems, 6 systems report that they did not finish and the last section was "services".
My desktop should have plenty of power to process these systems (8 at a time) to not timeout. If I re-audit any of these 8, they complete successfully - so that would indicate it's not bad data related (FYI - I have also fixed once-and-for-all the UTF-8 issue's). I'm scratching my head a bit here - so any thought's would be appreciated. Is anyone else seeing a lot of false "alerts" that look like all the services / keys / routes on a system are newly installed, when you know they are not?
I am thinking (assuming I fail to work out the actual cause), that I can implement a hack to fix this. When an audit is submitted, check to see if the last audit on that system failed. If so, update the timestamps on the relevent (subsequent) tables to reflect the last audit timestamp.
This would work - but it's ugly and would create a dependency on the audit results being submitted in a specific order. IE - if it failed on SOFTWARE, then I need to update software, services, keys & routes. If it failed on SERVICES, I would need to update services, keys & routes. This would then fail to account for changes on those tables, as at the previous audit.
I don't like that idea, but am at a loss to explain why the audits are failing and may be forced to implement it.
If anyone can offer thought's around this, please, please do post here.
In my mind the "alerts" feature of OAv2 is one of it's most compelling. We use it here to track unauthorised changes on our server fleet. It is a valuable feature that I've not seen in other products.
Help