DVM TroubleShooting

This playbook provides detailed instructions for common DVM troubleshooting resolutions. Knowing what to do when your system displays certain symptoms could greatly reduce data loss. 

First Step

As soon as your DVM goes down - contact DefenseStorm. Dependent on our agreement with you, we may have already been notified of the issue and started remediation.

Rebooting the DVM is the last step taken. While the DVM is down, it is still attempting to send logs, and once the connection is made all the logs utilizing TCP Protocol from when the DVM was down are sent to the console, and they display correctly since we utilize the ingestion timestamp, not the time of connection.

Note for Cloud based services

In the event that you DVM goes down, we still receive logs from cloud based service like Office365 and OpenDNS. These items flow from their cloud directly to our cloud, not the DVM. If your laptop utilizes the DefenseStorm Windows Agent (DWA), this also sends data directly to our cloud, bypassing the DVM. Rebooting the DVM causes all logs from the time it went down until the time it is back up to be lost. 

DVM Alerts and Troubleshooting Resolutions

DVM Disk Queue Malfunctioning

The DVM Disk Queue is the service on the DVM that attempts to write incoming events to disk in the event of a long-term outbound network connection failure (If the DVM cannot send events to Amazon SQS cloud; generally happens if the internet connection goes out, but still has local network access).

Initial Troubleshooting Steps

  •  Run this query:    app_name:pvm_stats AND -category:alert
  •  Look at the "pdiskqueue",  if it says DOWN,  you'll need to restart the pdiskqueue service.
  •  Look at the "syslog_ng" field, this is more important than the "pdiskqueue" service,  if it says "UP" then you are receiving the pvm_stats heartbeat events normally.
  •  Look at the "Dropped_events",  "Stored_events"  count recorded in the DVM stats messages since the incident, and if the stored event volume is close to zero, then the DVM appears to be healthy and operating normally at this time.

What needs to be done?

To fully resolve this issue,  restart the "pdiskqueue" service and restart the DVM by following the instructions listed below.

  1.   Go to DVM Main menu, select option 10 from the main menu (Bash shell)
  2.   Run this command
    1. sudo  /etc/init.d/pdiskqueue restart
    2. exit (goes back to DVM menu)
  3. Get DVM Status by going to Option (8) from the main menu
  4.  Check to see if this action reenabled the disk queue.

Reset DVM Clock

Run the following commands

cat /etc/cron.daily/ntpupdate

ntpdate -s -u pool.ntp.org

Disk Almost Full

Symptom: DVM gives warning that the disk space is full.

Cause (possible): When logs do not rotate off the system as expected or update packages are downloaded more than once, therefore taking up double the space.

Mitigation Steps:  Manually clean the logs off the system or remove installers for already applied update packages.

STEP 1: Determine if the disk is full because of log rotation error

In the Bash Shell (DVM menu, option 10), check /var/log for runaway log file size using the shell command below.  

ls -lhrS /var/logList files in /var/log, sorted by size ascending (largest at end).

Inspect any very large files at the end of the list, by invoking the following command in the shell:

tail -n 40 filenameherePrint the last 40 lines of the file.

Note any errors for discussion with DefenseStorm support, and delete these log files to reclaim space.  

STEP 2: Determine if the disk is full because of already applied update packages

Run the sudo apt-get autoremove --purge command. This first runs an analysis estimating the reclaimable space, and can be cancelled at this point before any permanent removal is triggered.

Space commands

dfdisplay disk usage by device
du -h directorypath            display size of directory, append / to display per file
uname -rdisplay active DVM kernel version (useful when comparing packages on disk)

Sudo commands

These commands will need the DVM administrator to input the DVM login password. 

sudo du -x / | sort -n
sudo du -hx | sort -h                
get size of all file objects, then sort top-down
(optionally, in human-readable format)
sudo apt-get autoremove --purgeremove all unused / already installed packages

DVM Hung Upon Boot

Applicable Versions: DVM 1.1.5 and below (Ubuntu LTS 12.04, 14.04), VMware Fusion 5 and 6.

Symptom: Error - Host SMBus controller not enabled 

Cause (possible): VMware doesn’t provide that level interface for CPU access, but Ubuntu tries to load the kernel module anyway.

Mitigation Steps:  These mitigation steps work for VMware Fusion 5 and 6, and Ubuntu LTS 12.04 and 14.04

  1. Reboot the DVM - Keep an eye out for the GRUB splash screen to appear. 
  2. Press ESC at the GRUB prompt
  3. Press 'e' for edit 
  4. Highlight the line that begins "ubuntu ......... or kernel (recovery mode)”, if you have multiple versions with the recovery mode, select the topmost version marked as recovery mode, 
    1. press e
  5. Highlight the "Kernel...." and press e
  6. Replace "ro single" with “rw init=/bin/bash” 
    1. hit 'Enter', 
  7. Press 'b' to boot the system
  8. You are now in bash shell
    1. Go in to this file: vi /etc/modprobe.d/blacklist.conf
    2. Add the following lines to the bottom of the file
      • blacklist i2c-piix4
      • blacklist piix4_smbus
      • blacklist intel_rapl
  9. Reboot the DVM again

Syslog Configuration Problems

Symptom:  /var/log/syslog-ng.log contains spammed error lines such as, maximum connections reached; rejecting connection.  Maximum concurrent connections: 500.

Mitigation Steps: Increase Max Connections

STEP 1: Check DVM configuration

From the DVM menu, open a Bash Shell (option 10), then run the following command to view the DVM configuration file. 

On DVM 1.2.0+: nano /etc/praesidio/praesidio.conf
On DVM 1.1.5 or below: vi /etc/praesidio/praesidio.conf
Uses nano (DVM v.1.2.0+) or vi (DVM 1.1.5 or below) to view the main configuration file.

Navigate to the SyslogNG section, and inspect the maxconnections values. If this is lower than the number of machines registered to the DVM, error log spam can overflow the log file size (and possibly fill disk). 

/etc/praesidio/praesidio.conf: Relevant section and default values

Tcp514maxconnections = 100
Tcp516maxconnections = 100
Tcp601maxconnections = 500
Tcp1602maxconnections = 500

STEP 2: Increase the number of connections for the port in use (example: = 1000),

Change the port connection values as needed in the editor to accommodate your logging host count; save the file and exit back to the shell once complete.  

STEP 3: Run "sudo /usr/local/bin/pConfig —syslog" to reconfigure the DVM to use the new settings

STEP 4: "sudo service syslog-ng restart" to restart syslog-ng

STEP 5: Log clean up and final check

  • Delete the error.log and error.log.1 files. (sudo rm error.log)
  • Reboot the box (sudo reboot)
  • DVM Console --> Get DVM Status
  • DVM Console --> Troubleshooting

STEP 6: Post reboot log check

Bash Shell --> Check contents of error.log in /var/log.

  • Run the command  Tail -n 50 error.log

STEP 7: Post reboot monitor of syslog

Syslog-ng.conf (syslogng)

  •  Run the command /etc/syslog-ng/conf.d/praesidio.conf

Frequent DVM Reboot Alerts 

Symptom: The DVM sends unusually frequent reboot alerts.

Cause: The reboot required flag sets when an OS package upgrade requests it. The DVM is hardcoded to check for security updates daily. Until all packages have been updated, the alert may continue to display frequently.

Mitigation Steps: Enable the automatic DVM reboot feature. This only reboots the DVM if a security update has been applied in the last day that requires it.

STEP 1: Select Option (11) Configure Automatic Security Updates

Within the DVM’s Main Menu, select option 11 to enable Automatic Security Updates.

STEP 2: Select Automatic Reboot Time

After you select to enable automatic reboot, set the reboot time.

DVM down soon after reboot 

Symptom: After being rebooted, the DVM went down again after an hour.

Root cause: Disk failure was due to log spam from syslog-ng due to too many client connections. For example, if the maximum pool (port 601) is set to 500, and there are 800 machines configured for communication with the DVM, the error spam fills the disk and prevents syslog-ng from starting. This causes event data to never make it to the DefenseStorm platform.

Mitigation Steps: The following actions brought the DVM back to a stable state, and it should prevent this from reoccurring in the near future.

STEP 1: Run the command >> Df -h

  • Dev/sd117gb/1.1bg free

STEP 2: Run the command >> Du -h/var/log

  • 16G/var/log

STEP 3: Run the command >> cd/var/log

STEP 4: Run the command >> tail phython_sqs.log (blank)

STEP 5: Run the command >> ls -lh

  • Error.log4.5G
  • Error.log.1 (sep 18)    5.8G
  • Syslog3.#G

STEP 6: Run the command >> tail error.log

  • “Rejecting connection from client: maximum connection attempts reached”
    • IPs listed are local bank IPs.
      • Desktops, Windows servers, various hardware

STEP 7: Run the command >> du -h . 

  • 16G/var/log
  • ….

Syslog-ng.log: maximum connections reached; rejecting connection. Maximum concurrent connections: 500.

STEP 8: Identify the syslog config problem 

correct the syslog config problem before purging logs and restoring customer asset connectivity to DVM.

  1. Check DVM configuration (the SyslongNG section), and inspect the maxconnections values. If this is lower than the number of machines registered to the DVM, error log spam can overflow the log file size (and possible fill disk).


[SyslogNG] (port 514,516, 601…..)

Tcp514maxconnections = 100

Tcp516maxconnections = 100

Tcp601maxconnections = 500

>> sudo vi/etc/praesidio/praesidio.conf

  • Changed entries to 1000 on each port.
>> /etc/syslog-ng/conf.d/praesidio.conf

STEP 9: Mitigation steps for this file:

For all config regions that look like:

Network (
Network (
Max_connections (100)

STEP 10: Change highlighted to: max_connections(1000)

STEP 11: Repeat for other conf sections for 516 and 601

STEP 12: Delete the error.log and error.log.1 files.

Run the following command >> sudo rm error.log

STEP 13: Reboot the box

Run the following command >>  sudo reboot

STEP 14: DVM Console → Get DVM Status

  • All services up, 37% disk usage, queues are empty

STEP 15: DVM Console → Verification of resolution

  1. Ran connectivity test, all green now
  2. Post reboto log check:
    1. Bash Shell → Check contents of error.log in /var/log.
  3. Run command >> Tail -n 50 error.log
    1. Just NTP errors observed. No issues with IPs right now.

Verify data flow

STEP 1: Open a shell session on the DVM.

STEP 2: From the DVM, run the following command:

sudo tcpdump -vvv -s 4096 -X host

And port 514 where the IP address is the device sending log data and the port is the port it is sending to, which is typically 514 or 516.

Increase partition size

For the purpose of this procedure, we are increasing the primary partition to 22 gigs. Always make a snapshot of backup of the current instance, just in case something goes wrong.

STEP 1: Log into the DVM and go to the command line.

STEP 2: Turn off swap

sudo swapoff --all --verbose

STEP 3: Remove swap partition

sudo parted /dev/sda rm 2

STEP 4: Resize root partition

sudo parted /dev/sda resizepart 1 yes 24000

STEP 5: Interactively make new swap partition

praesidio@ubuntu:~$ sudo parted /dev/sda mkpart
Partition type? primary/extended? primary
File system type? [ext2]? linux-swap
Start? 24001
End? 25000

STEP 6: Make swap filesystem

sudo mkswap /dev/sda2

STEP 7:  Turn swap back on

sudo swapon --all --verbose

STEP 8: Resize root filesystem

sudo resize2fs /dev/sda1 22000M

STEP 9: Check that filesystem has grown

df -h

STEP 10: Reboot

sudo reboot

STEP 11: After reboot check that filesystem is still 22GB

df -h

STEP 12: Check swap is present (you should see 900 odd M for swap)

free -h

High CPU Usage

Symptom:  Seems that the DVM is running hot with a low number of CPUs.

Cause: While copying the syslog file, the DVM got stuck in a bad state.

Mitigation Steps: If it is the syslog file that is stuck, you can restart it without having to reboot the whole system. The following steps bring the DVM back to a healthy state and should return the DVM to a normal usage.

STEP 1: Enter the bash shell, and execute the following:

sudo top

This provides a table with all running processes. At least one process should display a high CPU percentage.

STEP 2: Restart the stuck process or DVM

If syslog-ng service is the process stuck, then you can restart it without rebooting the whole system by executing the following command: 

sudo/etc/init.d/syslog-ng restart

If syslog-ng is not the process that is stuck, you can always try a reboot of the DVM itself. If that still does not set the DVM to a healthy CPU usage state, then escalate this further to DefenseStorm.

Missed Events or Event Lag

Symptom:  Console may event logs.

Cause: Traffic too high, compression setting not correct, need to update the Windows Agent profile.

Mitigation Steps: If you believe your DVM may be dropping events, you can follow these mitigation steps to create a fallback for network monitoring.

STEP 1: Nload

  1. Nload provides a good picture of overall network utilization in real-time, displayed per network interface.  Useful to determine if a single NIC is saturated.
  2. Install steps:
    1.    Sudo apt-get update
    2. Sudo apt-get upgrade
    3. Sudo apt-get install nload

STEP 2: lftop

  1. Iftop provides a good picture of overall network utilization, with utilization displayed on a per-process level.  Useful to determine with processes are using bandwidth and identify unexpected sources.
  2. Install steps:
    1. Sudo apt-get install iftop


Iftop, ran by itself, drops into a console monitoring mode.  For the DVM, you’ll want to identify the processes that are:

  • Receiving traffic on syslog ports
  • Sending traffic out to an external AWS IP over port 443 (https)

Screenshots of the various values here, along with the measured totals from vnstat, should help us understand the characteristics of the network better.


STEP 3: Mtr (on ubuntu: mtr-tiny)

  1. MTR can be used to trace network routes and obtain reporting data.  Useful to determine if the path to our SQS server (Amazon US-West) is congested, and if the source of the issue is within the customer network, customer ISP, or caused by an external network location altogether
  2. Install steps:
    1. Sudo apt-get install mtr-tiny

Additional info (external link to Linode’s website): https://www.linode.com/docs/networking/diagnostics/diagnosing-network-issues-with-mtr


Down Disk Queue during Start-up

Sometimes during the initial DVM startup, the screen displays a disk write error (example shown below).

To correct this error, simply reboot the DVM by selecting Option 9 - Reboot from the DVM main menu.