Efficient Storage Automation in Proxmox with the proxmox_storage Module

Managing various storage systems in Proxmox environments often involves recurring tasks. Whether it’s creating new storage, connecting NFS, CIFS shares, iSCSI, or integrating more complex backends like CephFS or Proxmox Backup Server, in larger environments with multiple nodes or entire clusters, this can quickly become time-consuming, error-prone, and difficult to track.

With Ansible, these processes can be efficiently automated and standardized. Instead of manual configurations, Infrastructure as Code ensures a clear structure, reproducibility, and traceability of all changes. Similar to the relatively new module proxmox_cluster, which automates the creation and joining of Proxmox nodes to clusters, this now applies analogously to storage systems. This is precisely where the Ansible module proxmox_storage, developed by our highly esteemed colleague Florian Paul Azim Hoberg (also well-known in the open-source community as gyptazy), comes into play. It enables the simple and flexible integration of various storage types directly into Proxmox nodes and clusters, automated, consistent, and repeatable at any time. The module is already part of the Ansible Community.Proxmox Collections and has been included in the collections since version 1.3.0.

This makes storage management in Proxmox not only faster and more secure, but also seamlessly integrates into modern automation workflows.

Ansible Module: proxmox_storage

The proxmox_storage module is an Ansible module developed in-house at credativ for automated storage management in Proxmox VE. It supports various storage types such as NFS, CIFS, iSCSI, CephFS, and Proxmox Backup Server.

The module allows you to create new storage resources, adjust existing configurations, and completely automate the removal of no longer needed storage. Its integration into Ansible Playbooks enables idempotent and reproducible storage management in Proxmox nodes and clusters. The module simplifies complex configurations and reduces sources of error that can occur during manual setup.

Add iSCSI Storage

Integrating iSCSI storage into Proxmox enables centralized access to block-based storage that can be flexibly used by multiple nodes in the cluster. By using the proxmox_storage module, the connection can be configured automatically and consistently, which saves time and prevents errors during manual setup.

- name: Add iSCSI storage to Proxmox VE Cluster
  community.proxmox.proxmox_storage:
  api_host: proxmoxhost
  api_user: root@pam
  api_password: password123
  validate_certs: false
  nodes: ["de-cgn01-virt01", "de-cgn01-virt02", "de-cgn01-virt03"]
  state: present
  type: iscsi
  name: net-iscsi01
  iscsi_options:
  portal: 10.10.10.94
  target: "iqn.2005-10.org.freenas.ctl:s01-isci01"
  content: ["rootdir", "images"]

The integration takes place within a single task, where the consuming nodes and the iSCSI-relevant information are defined. It is also possible to define for which “content” this storage should be used.

Add Proxmox Backup Server

The Proxmox Backup Server (PBS) is also considered storage in Proxmox VE and can therefore be integrated into the environment just like other storage types. With the proxmox_storage module, a PBS can be easily integrated into individual nodes or entire clusters, making backups available centrally, consistently, and automatically.

- name: Add PBS storage to Proxmox VE Cluster
  community.proxmox.proxmox_storage:
  api_host: proxmoxhost
  api_user: root@pam
  api_password: password123
  validate_certs: false
  nodes: ["de-cgn01-virt01", "de-cgn01-virt02"]
  state: present
  name: backup-backupserver01
  type: pbs
  pbs_options:
  server: proxmox-backup-server.example.com
  username: backup@pbs
  password: password123
  datastore: backup
  fingerprint: "F3:04:D2:C1:33:B7:35:B9:88:D8:7A:24:85:21:DC:75:EE:7C:A5:2A:55:2D:99:38:6B:48:5E:CA:0D:E3:FE:66"
  export: "/mnt/storage01/b01pbs01"
  content: ["backup"]

Note: It is important to consider the fingerprint of the Proxmox Backup Server system that needs to be defined. This is always relevant if the instance’s associated certificate was not issued by a trusted root CA. If you are using and legitimizing your own root CA, this definition is not necessary. .

Remove Storage

No longer needed or outdated storage can be removed just as easily from Proxmox VE. With the proxmox_storage module, this process is automated and performed idempotently, ensuring that the cluster configuration remains consistent and unused resources are cleanly removed. A particular advantage is evident during storage migrations, as old storage can be removed in a controlled manner after successful data transfer. This way, environments can be gradually modernized without manual intervention or unnecessary configuration remnants remaining in the cluster.

- name: Remove storage from Proxmox VE Cluster
  community.proxmox.proxmox_storage:
  api_host: proxmoxhost
  api_user: root@pam
  api_password: password123
  validate_certs: false
  state: absent
  name: net-nfsshare01
  type: nfs

Conclusion

The example of automated storage integration with Ansible and Proxmox impressively demonstrates the advantages and extensibility of open-source solutions. Open-source products like Proxmox VE and Ansible can be flexibly combined, offering an enormous range of applications that also prove their worth in enterprise environments.

A decisive advantage is the independence from individual manufacturers, meaning companies do not have to fear vendor lock-in and retain more design freedom in the long term. At the same time, it becomes clear that the successful implementation of such scenarios requires sound knowledge and experience to optimally leverage the possibilities of open source.

While this only covers a partial area, our colleague Florian Paul Azim Hoberg (gyptazy) impressively demonstrates here in his video “Proxmox Cluster Fully Automated: Cluster Creation, NetApp Storage & SDN Networking with Ansible” what full automation with Proxmox can look like.

This is precisely where we stand by your side as a partner and gladly support you in the areas of automation, development, as well as with all questions regarding Proxmox and modern infrastructures. Do not hesitate to contact us – we would be happy to advise you!

Automated Proxmox Subscription Handling with Ansible

When deploying Proxmox VE in enterprise environments, whether for new locations, expanding existing clusters, or migrating from platforms like VMware, automation becomes essential. These scenarios typically involve rolling out dozens or even hundreds of nodes across multiple sites. Manually activating subscriptions through the Proxmox web interface is not practical at this scale.

To ensure consistency and efficiency, every part of the deployment process should be automated from the beginning. This includes not just the installation and configuration of nodes, automated cluster creation, but also the activation of the Proxmox subscription. In the past, this step often required manual interaction, which slowed down provisioning and introduced unnecessary complexity.

Now there is a clean solution to this. With the introduction of the new Ansible module proxmox_node, the subscription management is fully integrated. This module allows you to handle subscription activation as part of your Ansible playbooks, making it possible to automate the entire process without ever needing to open the web interface.

This improvement is particularly valuable for mass deployments, where reliability and repeatability matter most. Every node can now be automatically configured, licensed, and production-ready right after boot. It is a great example of how Proxmox VE continues to evolve into a more enterprise-friendly platform, while still embracing the flexibility and openness that sets it apart.

Ansible Module: proxmox_node

With automation becoming more critical in modern IT operations, managing Proxmox VE infrastructure through standardized tools like Ansible has become a common practice. Until now, while there were various community modules available to interact with Proxmox resources, node-level management often required custom workarounds or direct SSH access. That gap has now been closed with the introduction of the new proxmox_node module.

This module was developed by our team at credativ GmbH, specifically by our colleague known in the community under the handle gyptazy. It has been contributed upstream and is already part of the official Ansible Community Proxmox collection, available to anyone using the collection via Ansible Galaxy or automation controller integrations.

The proxmox_node module focuses on tasks directly related to the lifecycle and configuration of a Proxmox VE node. What makes this module particularly powerful is that it interacts directly with the Proxmox API, without requiring any SSH access to the node. This enables a cleaner, more secure, and API-driven approach to automation.

The module currently supports several key features that are essential in real-world operations:

Managing Subscription Licenses
One of the standout features is the ability to automatically upload and activate a Proxmox VE subscription key. This is incredibly helpful for enterprises rolling out clusters at scale, where licensing should be handled consistently and automatically as part of the provisioning workflow.
Controlling Power States
Power management of nodes can now be handled via Ansible, making it easy to start (via Wake-on-Lan) or shutdown nodes as part of playbook-driven maintenance tasks or during automated cluster operations.
Managing DNS Configuration
DNS settings such as resolvers and search domains can be modified declaratively, ensuring all nodes follow the same configuration policies without manual intervention.
Handling X509 Certificates
The module also allows you to manage TLS certificates used by the node. Whether you’re deploying internal PKI-signed certificates or using externally issued ones, the proxmox_node module lets you upload and apply them through automation in a clean and repeatable way.

By bringing all of this functionality into a single, API-driven Ansible module, the process of managing Proxmox nodes becomes much more reliable and maintainable. You no longer need to script around pveproxy with shell commands or use SSH just to manage node settings.

Subscription Integration Example

Adding a subscription to a Proxmox VE node is as simple as the following task. While this shows the easiest way for a single node, this can also be used in a loop over a dictionary holding the related subscriptions for each node.

- name: Place a subscription license on a Proxmox VE Node
  community.proxmox.node:
    api_host: proxmoxhost
    api_user: gyptazy@pam
    api_password: password123
    validate_certs: false
    node_name: de-cgn01-virt01
    subscription:
        state: present
        key: ABCD-EFGH-IJKL-MNOP-QRST-UVWX-YZ0123456789

Conclusion

For us at credativ, this module fills a real gap in the automation landscape around Proxmox and demonstrates how missing features in open-source projects can be addressed effectively by contributing upstream. It also reinforces the broader movement of managing infrastructure declaratively, where configuration is versioned, documented, and easily reproducible.

In combination with other modules from the community Proxmox collection like our recent proxmox_cluster module, proxmox_node helps complete the picture of a fully automated Proxmox VE environment — from cluster creation and VM provisioning to node configuration and licensing. If you’re looking for help or assistance for creating Proxmox VE based virtualization infrastructures, automation or custom development to fit your needs, we’re always happy to help! Feel free to contact us at any time.

Efficient Proxmox Cluster Deployment through Automation with Ansible

Manually setting up and managing servers is usually time-consuming, error-prone, and difficult to scale. This becomes especially evident during large-scale rollouts, when building complex infrastructures, or during the migration from other virtualization environments. In such cases, traditional manual processes quickly reach their limits. Consistent automation offers an effective and sustainable solution to these challenges.

Proxmox is a powerful virtualization platform known for its flexibility and comprehensive feature set. When combined with Ansible, a lightweight and agentless automation tool, the management of entire system landscapes becomes significantly more efficient. Ansible allows for the definition of reusable configurations in the form of playbooks, ensuring that deployment processes are consistent, transparent, and reproducible.

To enable fully automated deployment of Proxmox clusters, our team member, known in the open-source community under the alias gyptazy, has developed a dedicated Ansible module called proxmox_cluster. This module handles all the necessary steps to initialize a Proxmox cluster and add additional nodes. It has been officially included in the upstream Ansible Community Proxmox collection and is available for installation via Ansible Galaxy starting with version 1.1.0. As a result, the manual effort required for cluster deployment is significantly reduced. Further insights can be found in his blog post titled “How My BoxyBSD Project Boosted the Proxmox Ecosystem“.

By adopting this solution, not only can valuable time be saved, but a solid foundation for scalable and low-maintenance infrastructure is also established. Unlike fragile task-based approaches that often rely on Ansible’s shell or command modules, this solution leverages the full potential of the Proxmox API through a dedicated module. As a result, it can be executed in various scopes and does not require SSH access to the target systems.

This automated approach makes it possible to deploy complex setups efficiently while laying the groundwork for stable and future-proof IT environments. Such environments can be extended at a later stage and are built according to a consistent and repeatable structure.

Benefits

Using the proxmox_cluster module for Proxmox cluster deployment brings several key advantages to modern IT environments. The focus lies on secure, flexible, and scalable interaction with the Proxmox API, improved error handling, and simplified integration across various use cases:

Use of the native Proxmox API
Full support for the Proxmox authentication system
- API Token Authentication support
No SSH access required
Usable in multiple scopes:
- From a dedicated deployment host
- From a local system
- Within the context of the target system itself
Improved error handling through API abstraction

Ansible Proxmox Module: proxmox_cluster

The newly added proxmox_cluster module in Ansible significantly simplifies the automated provisioning of Proxmox VE clusters. With just a single task, it enables the seamless creation of a complete cluster, reducing complexity and manual effort to a minimum.

Creating a Cluster

Creating a cluster requires now only a single task in Ansible by using the proxmox_cluster module:

- name: Create a Proxmox VE Cluster
  community.proxmox.proxmox_cluster:
    state: present
    api_host: proxmoxhost
    api_user: root@pam
    api_password: password123
    api_ssl_verify: false
    link0: 10.10.1.1
    link1: 10.10.2.1
    cluster_name: "devcluster"

Afterwards, the cluster is created and additional Proxmox VE nodes can join the cluster.

Joining a Cluster

Additional nodes can now also join the cluster using a single task. When combined with the use of a dynamic inventory, it becomes easy to iterate over a list of nodes from a defined group and add them to the cluster within a loop. This approach enables the rapid deployment of larger Proxmox clusters in an efficient and scalable manner.

- name: Join a Proxmox VE Cluster
  community.proxmox.proxmox_cluster:
    state: present
    api_host: proxmoxhost
    api_user: root@pam
    api_password: password123
    master_ip: "{{ primary_node }}"
    fingerprint: "{{ cluster_fingerprint }}"
    cluster_name: “devcluster"

Cluster Join Informationen

In order for a node to join a Proxmox cluster, the cluster’s join information is generally required. To avoid defining this information manually for each individual cluster, this step can also be automated. As part of this feature, a new module called cluster_join_info has been introduced. It allows the necessary data to be retrieved automatically via the Proxmox API and made available for further use in the automation process.

- name: List existing Proxmox VE cluster join information
  community.proxmox.proxmox_cluster_join_info:
    api_host: proxmox1
    api_user: root@pam
    api_password: "{{ password | default(omit) }}"
    api_token_id: "{{ token_id | default(omit) }}"
    api_token_secret: "{{ token_secret | default(omit) }}"
  register: proxmox_cluster_join

Conclusion

While automation in the context of virtualization technologies is often focused on the provisioning of guest systems or virtual machines (VMs), this approach demonstrates that automation can be applied at a much deeper level within the underlying infrastructure. It is also possible to fully automate scenarios in which nodes are initially deployed using a customer-specific image with Proxmox VE preinstalled, and then proceed to automatically create the cluster.

As an official Proxmox partner, we are happy to support you in implementing a comprehensive automation strategy tailored to your environment and based on Proxmox products. You can contact us at any time!

Introduction

Puppet is a software configuration management solution to manage IT infrastructure. One of the first things to be learnt about Puppet is its domain-specific language – the Puppet-DSL – and the concepts that come with it.

Users can organize their code in classes and modules and use pre-defined resource types to manage resources like files, packages, services and others.

The most commonly used types are part of the Puppet core, implemented in Ruby. Composite resource types may be defined via Puppet-DSL from already known types by the Puppet user themself, or imported as part of an external Puppet module which is maintained by external module developers.

It so happens that Puppet users can stay within the Puppet-DSL for a long time even when they deal with Puppet on a regular basis.

The first time I had a glimpse into this topic was when Debian Stable was shipping Puppet 5.5, which was not too long ago. The Puppet 5.5 documentation includes a chapter on custom types and provider development respectively, but to me they felt incomplete and lacking self contained examples. Apparently I was not the only one feeling that way, even though Puppet’s gem documentation is a good overview of what is possible in principle.

Gary Larizza’s blog post was more than ten years ago. I had another look into the documentation for Puppet 7 on that topic recently, as this is the Puppet version in current’s Debian Stable.

The Puppet 5.5 way to type & provider development is now called the low level method, and its documentation has not changed significantly. However, Puppet 6 upwards recommends a new method to create custom types & providers via the so-called Resource-API, whose documentation is a major improvement compared to the low-level method’s. The Resource-API is not a replacement, though, and has several documented limitations.

Nevertheless, for the remaining blog post, we will re-prototype a small portion of files functionality using the low-level method, as well as the Resource-API, namely the ensure and content properties.

Preparations

The following preparations are not necessary in an agent-server setup. We use bundle to obtain a puppet executable for this demo.

demo@85c63b50bfa3:~$ cat > Gemfile <<EOF
source 'https://rubygems.org'

gem 'puppet', '>= 6'
EOF

demo@85c63b50bfa3:~$ bundle install
Fetching gem metadata from https://rubygems.org/........
Resolving dependencies...
...
Installing puppet 8.10.0
Bundle complete! 1 Gemfile dependency, 17 gems now installed.
Bundled gems are installed into `./.vendor`

demo@85c63b50bfa3:~$ cat > file_builtin.pp <<EOF
$file = '/home/demo/puppet-file-builtin'
file {$file: content => 'This is madness'}
EOF

demo@85c63b50bfa3:~$ bin/puppet apply file_builtin.pp
Notice: Compiled catalog for 85c63b50bfa3 in environment production in 0.01 seconds
Notice: /Stage[main]/Main/File[/home/demo/puppet-file-builtin]/ensure: defined content as '{sha256}0549defd0a7d6d840e3a69b82566505924cacbe2a79392970ec28cddc763949e'
Notice: Applied catalog in 0.04 seconds

demo@85c63b50bfa3:~$ sha256sum /home/demo/puppet-file-builtin
0549defd0a7d6d840e3a69b82566505924cacbe2a79392970ec28cddc763949e /home/demo/puppet-file-builtin

Keep in mind the state change information printed by Puppet.

Low-level prototype

Custom types and providers that are not installed via a Gem need to be part of some Puppet module, so they can be copied to Puppet agents via the pluginsync mechanism.

A common location for Puppet modules is the modules directory inside a Puppet environment. For this demo, we declare a demo module.

Basic functionality

Our first attempt is the following type definition for a new type we will call file_llmethod. It has no documentation or validation of input values.

# modules/demo/lib/puppet/type/file_llmethod.rb

Puppet::Type.newtype(:file_llmethod) do
  newparam(:path, namevar: true) {}
  newproperty(:content) {}
  newproperty(:ensure) do
    newvalues(:present, :absent)
    defaultto(:present)
  end
end

We have declared a path parameter that serves as the namevar for this type – there cannot be other file_llmethod instances managing the same path. The ensure property is restricted to two values and defaults to present.

The following provider implementation consists of a getter and a setter for each of the two properties content and ensure.

# modules/demo/lib/puppet/provider/file_llmethod/ruby.rb

Puppet::Type.type(:file_llmethod).provide(:ruby) do
  def ensure
    File.exist?(@resource[:path]) ? :present : :absent
  end

  def ensure=(value)
    if value == :present
      # reuse setter
      self.content=(@resource[:content])
    else
      File.unlink(@resource[:path])
    end
  end

  def content
    File.read(@resource[:path])
  end

  def content=(value)
    File.write(@resource[:path], value)
  end
end

This gives us the following:

demo@85c63b50bfa3:~$ cat > file_llmethod_create.pp <<EOF
$file = '/home/demo/puppet-file-lowlevel-method-create'
file {$file: ensure => absent} ->
file_llmethod {$file: content => 'This is Sparta!'}
EOF

demo@85c63b50bfa3:~$ bin/puppet apply --modulepath modules file_llmethod_create.pp
Notice: Compiled catalog for 85c63b50bfa3 in environment production in 0.01 seconds
Notice: /Stage[main]/Main/File_llmethod[/home/demo/puppet-file-lowlevel-method-create]/ensure: defined 'ensure' as 'present'

demo@85c63b50bfa3:~$ bin/puppet apply --modulepath modules file_llmethod_create.pp
Notice: Compiled catalog for 85c63b50bfa3 in environment production in 0.01 seconds
Notice: /Stage[main]/Main/File[/home/demo/puppet-file-lowlevel-method-create]/ensure: removed
Notice: /Stage[main]/Main/File_llmethod[/home/demo/puppet-file-lowlevel-method-create]/ensure: defined 'ensure' as 'present'

demo@85c63b50bfa3:~$ cat > file_llmethod_change.pp <<EOF
$file = '/home/demo/puppet-file-lowlevel-method-change'
file {$file: content => 'This is madness'} ->
file_llmethod {$file: content => 'This is Sparta!'}
EOF

demo@85c63b50bfa3:~$ bin/puppet apply --modulepath modules file_llmethod_change.pp
Notice: Compiled catalog for 85c63b50bfa3 in environment production in 0.02 seconds
Notice: /Stage[main]/Main/File[/home/demo/puppet-file-lowlevel-method-change]/ensure: defined content as '{sha256}0549defd0a7d6d840e3a69b82566505924cacbe2a79392970ec28cddc763949e'
Notice: /Stage[main]/Main/File_llmethod[/home/demo/puppet-file-lowlevel-method-change]/content: content changed 'This is madness' to 'This is Sparta!'

demo@85c63b50bfa3:~$ bin/puppet apply --modulepath modules file_llmethod_change.pp
Notice: Compiled catalog for 85c63b50bfa3 in environment production in 0.02 seconds
Notice: /Stage[main]/Main/File[/home/demo/puppet-file-lowlevel-method-change]/content: content changed '{sha256}823cbb079548be98b892725b133df610d0bff46b33e38b72d269306d32b73df2' to '{sha256}0549defd0a7d6d840e3a69b82566505924cacbe2a79392970ec28cddc763949e'
Notice: /Stage[main]/Main/File_llmethod[/home/demo/puppet-file-lowlevel-method-change]/content: content changed 'This is madness' to 'This is Sparta!'

Our custom type already kind of works, even though we have not implemented any explicit comparison of is- and should-state. Puppet does this for us based on the Puppet catalog and the property getter return values. Our defined setters are also invoked by Puppet on demand, only.

We can also see that the ensure state change notice is defined 'ensure' as 'present' and does not incorporate the desired content in any way, while the content state change notice shows plain text. Both tell us that the SHA256 checksum from the file_builtin.pp example is already something non-trivial.

Validating input

As a next step we add validation for path and content.

# modules/demo/lib/puppet/type/file_llmethod.rb

Puppet::Type.newtype(:file_llmethod) do
  newparam(:path, namevar: true) do
    validate do |value|
      fail "#{value} is not a String" unless value.is_a?(String)
      fail "#{value} is not an absolute path" unless File.absolute_path?(value)
    end
  end

  newproperty(:content) do
    validate do |value|
      fail "#{value} is not a String" unless value.is_a?(String)
    end
  end

  newproperty(:ensure) do
    newvalues(:present, :absent)
    defaultto(:present)
  end
end

Failed validations will look like these:

demo@85c63b50bfa3:~$ bin/puppet apply --modulepath modules --exec 'file_llmethod {"./relative/path": }'
Notice: Compiled catalog for 85c63b50bfa3 in environment production in 0.02 seconds
Error: Parameter path failed on File_llmethod[./relative/path]: ./relative/path is not an absolute path (line: 1)

demo@85c63b50bfa3:~$ bin/puppet apply --modulepath modules --exec 'file_llmethod {"/absolute/path": content => 42}'
Notice: Compiled catalog for 85c63b50bfa3 in environment production in 0.02 seconds
Error: Parameter content failed on File_llmethod[/absolute/path]: 42 is not a String (line: 1)

Content Checksums

We override change_to_s so that state changes include content checksums:

# modules/demo/lib/puppet/type/file_llmethod.rb

require 'digest'

Puppet::Type.newtype(:file_llmethod) do
  newparam(:path, namevar: true) do
    validate do |value|
      fail "#{value} is not a String" unless value.is_a?(String)
      fail "#{value} is not an absolute path" unless File.absolute_path?(value)
    end
  end

  newproperty(:content) do
    validate do |value|
      fail "#{value} is not a String" unless value.is_a?(String)
    end
    define_method(:change_to_s) do |currentvalue, newvalue|
      old = "{sha256}#{Digest::SHA256.hexdigest(currentvalue)}"
      new = "{sha256}#{Digest::SHA256.hexdigest(newvalue)}"
      "content changed '#{old}' to '#{new}'"
    end
  end

  newproperty(:ensure) do
    define_method(:change_to_s) do |currentvalue, newvalue| if currentvalue == :absent should = @resource.property(:content).should digest = "{sha256}#{Digest::SHA256.hexdigest(should)}" "defined content as '#{digest}'" else super(currentvalue, newvalue) end end newvalues(:present, :absent) defaultto(:present) end end

The above type definition yields:

demo@85c63b50bfa3:~$ bin/puppet apply --modulepath modules file_llmethod_create.pp
Notice: Compiled catalog for 85c63b50bfa3 in environment production in 0.02 seconds
Notice: /Stage[main]/Main/File[/home/demo/puppet-file-lowlevel-method-create]/ensure: removed
Notice: /Stage[main]/Main/File_llmethod[/home/demo/puppet-file-lowlevel-method-create]/ensure: defined content as '{sha256}823cbb079548be98b892725b133df610d0bff46b33e38b72d269306d32b73df2'

demo@85c63b50bfa3:~$ bin/puppet apply --modulepath modules file_llmethod_change.pp
Notice: Compiled catalog for 85c63b50bfa3 in environment production in 0.02 seconds
Notice: /Stage[main]/Main/File[/home/demo/puppet-file-lowlevel-method-change]/content: content changed '{sha256}823cbb079548be98b892725b133df610d0bff46b33e38b72d269306d32b73df2' to '{sha256}0549defd0a7d6d840e3a69b82566505924cacbe2a79392970ec28cddc763949e'
Notice: /Stage[main]/Main/File_llmethod[/home/demo/puppet-file-lowlevel-method-change]/content: content changed '{sha256}0549defd0a7d6d840e3a69b82566505924cacbe2a79392970ec28cddc763949e' to '{sha256}823cbb079548be98b892725b133df610d0bff46b33e38b72d269306d32b73df2'

Improving memory footprint

So far so good. While our current implementation apparently works, it has at least one major flaw. If the managed file already exists, the provider stores the file’s whole content in memory.

demo@85c63b50bfa3:~$ cat > file_llmethod_change_big.pp <<EOF
$file = '/home/demo/puppet-file-lowlevel-method-change_big'
file_llmethod {$file: content => 'This is Sparta!'}
EOF

demo@85c63b50bfa3:~$ rm -f /home/demo/puppet-file-lowlevel-method-change_big
demo@85c63b50bfa3:~$ ulimit -Sv 200000

demo@85c63b50bfa3:~$ bin/puppet apply --modulepath modules file_llmethod_change_big.pp
Notice: Compiled catalog for 85c63b50bfa3 in environment production in 0.02 seconds
Notice: /Stage[main]/Main/File_llmethod[/home/demo/puppet-file-lowlevel-method-change_big]/ensure: defined content as '{sha256}823cbb079548be98b892725b133df610d0bff46b33e38b72d269306d32b73df2'
Notice: Applied catalog in 0.02 seconds

demo@85c63b50bfa3:~$ dd if=/dev/zero of=/home/demo/puppet-file-lowlevel-method-change_big seek=8G bs=1 count=1
1+0 records in
1+0 records out
1 byte copied, 8.3047e-05 s, 12.0 kB/s

demo@85c63b50bfa3:~$ bin/puppet apply --modulepath modules file_llmethod_change_big.pp
Notice: Compiled catalog for 85c63b50bfa3 in environment production in 0.02 seconds
Error: Could not run: failed to allocate memory

Instead, the implementation should only store the checksum so that Puppet can decide based on checksums if our content= setter needs to be invoked.

This also means that the Puppet catalog’s content needs to be checksummed by munge before it it processed by Puppet’s internal comparison routine. Luckily we also have access to the original value via shouldorig.

# modules/demo/lib/puppet/type/file_llmethod.rb

require 'digest'

Puppet::Type.newtype(:file_llmethod) do
  newparam(:path, namevar: true) do
    validate do |value|
      fail "#{value} is not a String" unless value.is_a?(String)
      fail "#{value} is not an absolute path" unless File.absolute_path?(value)
    end
  end

  newproperty(:content) do
    validate do |value|
      fail "#{value} is not a String" unless value.is_a?(String)
    end
    munge do |value|
      "{sha256}#{Digest::SHA256.hexdigest(value)}"
    end
    # No need to override change_to_s with munging
  end

  newproperty(:ensure) do
    define_method(:change_to_s) do |currentvalue, newvalue|
      if currentvalue == :absent
        should = @resource.property(:content).should
        "defined content as '#{should}'"
      else
        super(currentvalue, newvalue)
      end
    end
    newvalues(:present, :absent)
    defaultto(:present)
  end
end

# modules/demo/lib/puppet/provider/file_llmethod/ruby.rb

Puppet::Type.type(:file_llmethod).provide(:ruby) do
  ...

  def content
    File.open(@resource[:path], 'r') do |file|
      sha = Digest::SHA256.new
      while chunk = file.read(2**16)
        sha << chunk
      end
      "{sha256}#{sha.hexdigest}"
    end
  end

  def content=(value)
    # value is munged, but we need to write the original
    File.write(@resource[:path], @resource.parameter(:content).shouldorig[0])
  end
end

Now we can manage big files:

demo@85c63b50bfa3:~$ ulimit -Sv 200000
demo@85c63b50bfa3:~$ dd if=/dev/zero of=/home/demo/puppet-file-lowlevel-method-change_big seek=8G bs=1 count=1
1+0 records in
1+0 records out
1 byte copied, 9.596e-05 s, 10.4 kB/s
demo@85c63b50bfa3:~$ bin/puppet apply --modulepath modules file_llmethod_change_big.pp
Notice: Compiled catalog for 85c63b50bfa3 in environment production in 0.02 seconds
Notice: /Stage[main]/Main/File_llmethod[/home/demo/puppet-file-lowlevel-method-change_big]/content: content changed '{sha256}ef17a425c57a0e21d14bec2001d8fa762767b97145b9fe47c5d4f2fda323697b' to '{sha256}823cbb079548be98b892725b133df610d0bff46b33e38b72d269306d32b73df2'

Ensure it the Puppet way

There is still something not right. Maybe you have noticed that our provider’s content getter attempts to open a file unconditionally, and yet the file_llmethod_create.pp run has not produced an error. It seems that an ensure transition from absent to present short-circuits the content getter, even though we have not expressed a wish to do so.

It turns out that an ensure property gets special treatment by Puppet. If we had attempted to use a makeitso property instead of ensure, there would be no short-circuiting and the content getter would raise an exception.

We will not fix the content getter though. If Puppet has special treatment for ensure, we should use Puppet’s intended mechanism for it, and declare the type ensurable:

# modules/demo/lib/puppet/type/file_llmethod.rb

require 'digest'

Puppet::Type.newtype(:file_llmethod) do
  ensurable

  newparam(:path, namevar: true) do
    validate do |value|
      fail "#{value} is not a String" unless value.is_a?(String)
      fail "#{value} is not an absolute path" unless File.absolute_path?(value)
    end
  end

  newproperty(:content) do
    validate do |value|
      fail "#{value} is not a String" unless value.is_a?(String)
    end
    munge do |value|
      "{sha256}#{Digest::SHA256.hexdigest(value)}"
    end
  end
end

With ensurable the provider needs to implement three new methods, but we can drop the ensure accessors:

# modules/demo/lib/puppet/provider/file_llmethod/ruby.rb

Puppet::Type.type(:file_llmethod).provide(:ruby) do
  def exists?
    File.exist?(@resource[:name])
  end

  def create
    self.content=(:dummy)
  end

  def destroy
    File.unlink(@resource[:name])
  end

  def content
    File.open(@resource[:path], 'r') do |file|
      sha = Digest::SHA256.new
      while chunk = file.read(2**16)
        sha << chunk
      end
      "{sha256}#{sha.hexdigest}"
    end
  end

  def content=(value)
    # value is munged, but we need to write the original
    File.write(@resource[:path], @resource.parameter(:content).shouldorig[0])
  end
end

However, now we have lost the SHA256 checksum on file creation:

demo@85c63b50bfa3:~$ bin/puppet apply --modulepath modules file_llmethod_create.pp Notice: Compiled catalog for 85c63b50bfa3 in environment production in 0.02 seconds Notice: /Stage[main]/Main/File[/home/demo/puppet-file-lowlevel-method-create]/ensure: removed Notice: /Stage[main]/Main/File_llmethod[/home/demo/puppet-file-lowlevel-method-create]/ensure: created

To get it back, we replace ensurable by an adapted implementation of it, which includes our previous change_to_s override:

newproperty(:ensure, :parent => Puppet::Property::Ensure) do
  defaultvalues
  define_method(:change_to_s) do |currentvalue, newvalue|
    if currentvalue == :absent
      should = @resource.property(:content).should
      "defined content as '#{should}'"
    else
      super(currentvalue, newvalue)
    end
  end
end

demo@85c63b50bfa3:~$ bin/puppet apply --modulepath modules file_llmethod_create.pp
Notice: Compiled catalog for 85c63b50bfa3 in environment production in 0.02 seconds
Notice: /Stage[main]/Main/File[/home/demo/puppet-file-lowlevel-method-create]/ensure: removed
Notice: /Stage[main]/Main/File_llmethod[/home/demo/puppet-file-lowlevel-method-create]/ensure: defined content as '{sha256}823cbb079548be98b892725b133df610d0bff46b33e38b72d269306d32b73df2'

Our final low-level prototype is thus as follows.

Final low-level prototype

# modules/demo/lib/puppet/type/file_llmethod.rb

# frozen_string_literal: true

require 'digest'

Puppet::Type.newtype(:file_llmethod) do
  newparam(:path, namevar: true) do
    validate do |value|
      fail "#{value} is not a String" unless value.is_a?(String)
      fail "#{value} is not an absolute path" unless File.absolute_path?(value)
    end
  end

  newproperty(:content) do
    validate do |value|
      fail "#{value} is not a String" unless value.is_a?(String)
    end
    munge do |value|
      "{sha256}#{Digest::SHA256.hexdigest(value)}"
    end
  end

  newproperty(:ensure, :parent => Puppet::Property::Ensure) do
    defaultvalues
    define_method(:change_to_s) do |currentvalue, newvalue|
      if currentvalue == :absent
        should = @resource.property(:content).should
        "defined content as '#{should}'"
      else
        super(currentvalue, newvalue)
      end
    end
  end
end

# modules/demo/lib/puppet/provider/file_llmethod/ruby.rb

# frozen_string_literal: true

Puppet::Type.type(:file_llmethod).provide(:ruby) do
  def exists?
    File.exist?(@resource[:name])
  end

  def create
    self.content=(:dummy)
  end

  def destroy
    File.unlink(@resource[:name])
  end

  def content
    File.open(@resource[:path], 'r') do |file|
      sha = Digest::SHA256.new
      while (chunk = file.read(2**16))
        sha << chunk
      end
      "{sha256}#{sha.hexdigest}"
    end
  end

  def content=(_value)
    # value is munged, but we need to write the original
    File.write(@resource[:path], @resource.parameter(:content).shouldorig[0])
  end
end

Resource-API prototype

According to the Resource-API documentation we need to define our new file_rsapi type by calling Puppet::ResourceApi.register_type with several parameters, amongst which are the desired attributes, even ensure.

# modules/demo/lib/puppet/type/file_rsapi.rb

require 'puppet/resource_api'

Puppet::ResourceApi.register_type(
  name: 'file_rsapi', 
  attributes: {
    content: {
      desc: 'description of content parameter',
      type: 'String' 
    },
    ensure: {
      default: 'present',
      desc: 'description of ensure parameter',
      type: 'Enum[present, absent]'
    },
    path: {
      behaviour: :namevar,
      desc: 'description of path parameter',
      type: 'Pattern[/\A\/([^\n\/\0]+\/*)*\z/]'
    },
  },
  desc: 'description of file_rsapi'
)

The path type uses a built-in Puppet data type. Stdlib::Absolutepath would have been more convenient but external data types are not possible with the Resource-API yet.

In comparison with our low-level prototype, the above type definition has no SHA256-munging and SHA256-output counterparts. The canonicalize provider feature looks similar to munging, but we skip it for now.

The Resource-API documentation tells us to implement a get and a set method in our provider, stating

The get method reports the current state of the managed resources. It returns an enumerable of all existing resources. Each resource is a hash with attribute names as keys, and their respective values as values.

This demand is the first bummer, as we definitely do not want to read all files with their content and store it in memory. We can ignore this demand – how would the Resource-API know anyway.

However, the documented signature is def get(context) {...} where context has no information about the resource we want to manage.

This would have been a show-stopper, if the simple_get_filter provider feature didn’t exist, which changes the signature to def get(context, names = nil) {...}.

Our first version of file_rsapi is thus the following.

Basic functionality

# modules/demo/lib/puppet/type/file_rsapi.rb

require 'puppet/resource_api'

Puppet::ResourceApi.register_type(
  name: 'file_rsapi',
  features: %w[simple_get_filter],
  attributes: {
    content: {
      desc: 'description of content parameter',
      type: 'String' 
    },
    ensure: {
      default: 'present',
      desc: 'description of ensure parameter',
      type: 'Enum[present, absent]'
    },
    path: {
      behaviour: :namevar,
      desc: 'description of path parameter',
      type: 'Pattern[/\A\/([^\n\/\0]+\/*)*\z/]'
    },
  },
  desc: 'description of file_rsapi'
)

# modules/demo/lib/puppet/provider/file_rsapi/file_rsapi.rb

require 'digest'

class Puppet::Provider::FileRsapi::FileRsapi
  def get(context, names)
    (names or []).map do |name|
      File.exist?(name) ? {
        path: name,
        ensure: 'present',
        content: filedigest(name),
      } : nil
    end.compact # remove non-existing resources
  end

  def set(context, changes)
    changes.each do |path, change|
      if change[:should][:ensure] == 'present'
        File.write(path, change[:should][:content])
      elsif File.exist?(path)
        File.delete(path)
      end
    end
  end

  def filedigest(path)
    File.open(path, 'r') do |file|
      sha = Digest::SHA256.new
      while chunk = file.read(2**16)
         sha << chunk
      end
      "{sha256}#{sha.hexdigest}"
    end
  end
end

The desired content is written correctly into the file, but we have again no SHA256 checksum on creation as well as unnecessary writes, because the checksum from get does not match the cleartext from the catalog:

demo@85c63b50bfa3:~$ cat > file_rsapi_create.pp <<EOF
$file = '/home/demo/puppet-file-rsapi-create'
file       {$file: ensure  => absent} ->
file_rsapi {$file: content => 'This is Sparta!'}
EOF

demo@85c63b50bfa3:~$ bin/puppet apply --modulepath modules file_rsapi_create.pp
Notice: Compiled catalog for 85c63b50bfa3 in environment production in 0.03 seconds
Notice: /Stage[main]/Main/File[/home/demo/puppet-file-rsapi-create]/ensure: removed
Notice: /Stage[main]/Main/File_rsapi[/home/demo/puppet-file-rsapi-create]/ensure: defined 'ensure' as 'present'
Notice: Applied catalog in 0.02 seconds

demo@85c63b50bfa3:~$ cat > file_rsapi_change.pp <<EOF
$file = '/home/demo/puppet-file-rsapi-change'
file       {$file: content => 'This is madness'} ->
file_rsapi {$file: content => 'This is Sparta!'}
EOF

demo@85c63b50bfa3:~$ bin/puppet apply --modulepath modules file_rsapi_change.pp
Notice: Compiled catalog for 85c63b50bfa3 in environment production in 0.03 seconds
Notice: /Stage[main]/Main/File[/home/demo/puppet-file-rsapi-change]/content: content changed '{sha256}823cbb079548be98b892725b133df610d0bff46b33e38b72d269306d32b73df2' to '{sha256}0549defd0a7d6d840e3a69b82566505924cacbe2a79392970ec28cddc763949e'
Notice: /Stage[main]/Main/File_rsapi[/home/demo/puppet-file-rsapi-change]/content: content changed '{sha256}0549defd0a7d6d840e3a69b82566505924cacbe2a79392970ec28cddc763949e' to 'This is Sparta!'
Notice: Applied catalog in 0.03 seconds

Hence we enable and implement the canonicalize provider feature.

Canonicalize

According to the documentation, canonicalize is applied to the results of get as well as to the catalog properties. On the one hand, we do not want to read the file’s content into memory, on the other hand cannot checksum the file’s content twice.

An easy way would be to check whether the canonicalized call happens after the get call:

def canonicalize(context, resources)
  if @stage == :get
    # do nothing, is-state canonicalization already performed by get()
  else
    # catalog canonicalization
    ...
  end
end

def get(context, paths)
  @stage = :get

  ...
end

While this works for the current implementation of the Resource-API, there is no guarantee about the order of canonicalize calls. Instead we subclass from String and handle checksumming internally. We also add some state change calls to the appropriate context methods.

Our final Resource-API-based prototype implementation is:

# modules/demo/lib/puppet/type/file_rsapi.rb

# frozen_string_literal: true

require 'puppet/resource_api'

Puppet::ResourceApi.register_type(
  name: 'file_rsapi',
  features: %w[simple_get_filter canonicalize],
  attributes: {
    content: {
      desc: 'description of content parameter',
      type: 'String'
    },
    ensure: {
      default: 'present',
      desc: 'description of ensure parameter',
      type: 'Enum[present, absent]'
    },
    path: {
      behaviour: :namevar,
      desc: 'description of path parameter',
      type: 'Pattern[/\A\/([^\n\/\0]+\/*)*\z/]'
    }
  },
  desc: 'description of file_rsapi'
)

# modules/demo/lib/puppet/provider/file_rsapi/file_rsapi.rb

# frozen_string_literal: true

require 'digest'
require 'pathname'

class Puppet::Provider::FileRsapi::FileRsapi
  class CanonicalString < String
    attr_reader :original

    def class
      # Mask as String for YAML.dump to mitigate
      #   Error: Transaction store file /var/cache/puppet/state/transactionstore.yaml
      #   is corrupt ((/var/cache/puppet/state/transactionstore.yaml): Tried to
      #   load unspecified class: Puppet::Provider::FileRsapi::FileRsapi::CanonicalString)
      String
    end

    def self.from(obj)
      return obj if obj.is_a?(self)
      return new(filedigest(obj)) if obj.is_a?(Pathname)

      new("{sha256}#{Digest::SHA256.hexdigest(obj)}", obj)
    end

    def self.filedigest(path)
      File.open(path, 'r') do |file|
        sha = Digest::SHA256.new
        while (chunk = file.read(2**16))
          sha << chunk
        end
        "{sha256}#{sha.hexdigest}"
      end
    end

    def initialize(canonical, original = nil)
      @original = original
      super(canonical)
    end
  end

  def canonicalize(_context, resources)
    resources.each do |resource|
      next if resource[:ensure] == 'absent'

      resource[:content] = CanonicalString.from(resource[:content])
    end
    resources
  end

  def get(_context, names)
    (names or []).map do |name|
      next unless File.exist?(name)

      {
        content: CanonicalString.from(Pathname.new(name)),
        ensure: 'present',
        path: name
      }
    end.compact # remove non-existing resources
  end

  def set(context, changes)
    changes.each do |path, change|
      if change[:should][:ensure] == 'present'
        File.write(path, change[:should][:content].original)
        if change[:is][:ensure] == 'present'
          # The only other possible change is due to content,
          # but content change transition info is covered implicitly
        else
          context.created("#{path} with content '#{change[:should][:content]}'")
        end
      elsif File.exist?(path)
        File.delete(path)
        context.deleted(path)
      end
    end
  end
end

It gives us:

demo@85c63b50bfa3:~$ cat > file_rsapi_create.pp <<EOF
$file = '/home/demo/puppet-file-rsapi-create'
file       {$file: ensure  => absent} ->
file_rsapi {$file: content => 'This is Sparta!'}
EOF

demo@85c63b50bfa3:~$ bin/puppet apply --modulepath modules file_rsapi_create.pp
Notice: Compiled catalog for 85c63b50bfa3 in environment production in 0.03 seconds
Notice: /Stage[main]/Main/File[/home/demo/puppet-file-rsapi-create]/ensure: removed
Notice: /Stage[main]/Main/File_rsapi[/home/demo/puppet-file-rsapi-create]/ensure: defined 'ensure' as 'present'
Notice: file_rsapi: Created: /home/demo/puppet-file-rsapi-create with content '{sha256}823cbb079548be98b892725b133df610d0bff46b33e38b72d269306d32b73df2'

demo@85c63b50bfa3:~$ cat > file_rsapi_change.pp <<EOF
$file = '/home/demo/puppet-file-rsapi-change'
file       {$file: content => 'This is madness'} ->
file_rsapi {$file: content => 'This is Sparta!'}
EOF

demo@85c63b50bfa3:~$ bin/puppet apply --modulepath modules file_rsapi_change.pp
Notice: Compiled catalog for 85c63b50bfa3 in environment production in 0.03 seconds
Notice: /Stage[main]/Main/File[/home/demo/puppet-file-rsapi-change]/content: content changed '{sha256}823cbb079548be98b892725b133df610d0bff46b33e38b72d269306d32b73df2' to '{sha256}0549defd0a7d6d840e3a69b82566505924cacbe2a79392970ec28cddc763949e'
Notice: /Stage[main]/Main/File_rsapi[/home/demo/puppet-file-rsapi-change]/content: content changed '{sha256}0549defd0a7d6d840e3a69b82566505924cacbe2a79392970ec28cddc763949e' to '{sha256}823cbb079548be98b892725b133df610d0bff46b33e38b72d269306d32b73df2'

demo@85c63b50bfa3:~$ cat > file_rsapi_remove.pp <<EOF
$file = '/home/demo/puppet-file-rsapi-remove'
file       {$file: ensure => present} ->
file_rsapi {$file: ensure => absent}
EOF

demo@85c63b50bfa3:~$ bin/puppet apply --modulepath modules file_rsapi_remove.pp
Notice: Compiled catalog for 85c63b50bfa3 in environment production in 0.03 seconds
Notice: /Stage[main]/Main/File[/home/demo/puppet-file-rsapi-remove]/ensure: created
Notice: /Stage[main]/Main/File_rsapi[/home/demo/puppet-file-rsapi-remove]/ensure: undefined 'ensure' from 'present'
Notice: file_rsapi: Deleted: /home/demo/puppet-file-rsapi-remove

Thoughts

The apparent need for a CanonicalString masking as String makes it look like we are missing something. If the Resource-API only checked data types after canonicalization, we could make get return something simpler than CanonicalString to signal an already canonicalized value.

The Resource-API’s default demand to return all existing resources simplifies development when one had this plan anyway. The low-level version of doing so is often a combination of prefetch and instances.

Introduction

Redis is a widely popular in-memory key-value high-performance database, which can also be used as a cache and message broker. It has been a go-to choice for many due to its performance and versatility. Many cloud providers offer Redis-based solutions:

Amazon Web Services (AWS) – Amazon ElastiCache for Redis
Microsoft Azure – Azure Cache for Redis
Google Cloud Platform (GCP) – Google Cloud Memorystore for Redis

However, due to recent changes in the licensing model of Redis, its prominence and usage are changing. Redis was initially developed under the open-source BSD license, allowing developers to freely use, modify and distribute the source code for both commercial and non-commercial purposes. As a result, Redis quickly gained popularity in the developer community.

But, Redis has recently changed to a dual source-available license. To be precise, in the future, it will be available under RSALv2 (Redis Source Available License Version 2) or SSPLv1 (Server Side Public License Version 1), commercial use requires individual agreements, potentially increasing costs for cloud service providers. For a detailed overview of these changes, refer to the Redis licensing page. Based on the Redis Community Edition, the source code will remain freely available for developers, customers and partners of the company. However, cloud service providers and others who want to use Redis as part of commercial offerings will have to make individual agreements with the provider.

Due to these recent changes in Redis’s licensing model, many developers and organizations are re-evaluating their in-memory key-value database choices. Valkey, an open-source fork of Redis, maintains the high performance and versatility while ensuring unrestricted use for both developers and commercial entities. The Linux Foundation forked the project and contributors are now supporting the Valkey project. More information can be found here and here. Its commitment to open-source principles has gained support from major cloud providers, including AWS. Amazon Web Services (AWS) announced “AWS is committed to supporting open source Valkey for the long term“, more information can be found here. So it may be the right time to switch the infrastructure from Redis to Valkey.

In this article, we will set up a Valkey instance with TLS and outline the steps to migrate your data from Redis seamlessly.

Overview of possible migration approaches

In general, there are several approaches to migrate:

Reuse the database file
In this approach, Redis is shutted down to update the rdb file on disk and Valkey will be started using this file in its data directory.
Use REPLICAOF to connect Valkey to the Redis instance
Register the new Valkey instance as a replica of a Redis master to stream the data. The Valkey instance and its network must be able to reach the Redis service.
Automated data migration to Valkey
Scripting the migration can be used on a machine that can reach both the Redis and Valkey databases.

In this bog article, we encounter that direct access to the file system of the Redis server in the cloud is not feasible to reuse the database file and that the Valkey service and Redis service are in different networks and cannot reach themselves for setting up a replica. As a result, we choose the third option and run an automated data migration script on a different machine, which can connect to both servers and transfer the data.

Setup of Valkey

In case you are using a cloud service, please consult their instructions how to setup a Valkey instance. Since it is a new project there are only a few distributions, which provides ready-to-use packages like Red Hat Enterprise Linux 8 and 9 via Extra Packages for Enterprise Linux (EPEL). In this blog post, we use an on-premises Debian 12 server to host the Valkey server in version 7.2.6 with TLS . Please consult your distribution guides to install Valkey or use the manual provided on GitHub. The migration itself with be done by a Python 3 script using TLS.

Start the server and establish a client connection:

In this bog article, we will use a server with the listed TLS parameters. We specify all used TLS parameters including port 0 to disable the non-TLS port completely:

$ valkey-server --tls-port 6379 --port 0 --tls-cert-file ./tls/redis.crt --tls-key-file ./tls/redis.key --tls-ca-cert-file ./tls/ca.crt

                .+^+.                                                
            .+#########+.                                            
        .+########+########+.           Valkey 7.2.6 (579cca5f/0) 64 bit
    .+########+'     '+########+.                                    
 .########+'     .+.     '+########.    Running in standalone mode
 |####+'     .+#######+.     '+####|    Port: 6379
 |###|   .+###############+.   |###|    PID: 436767                     
 |###|   |#####*'' ''*#####|   |###|                                 
 |###|   |####'  .-.  '####|   |###|                                 
 |###|   |###(  (@@@)  )###|   |###|          https://valkey.io      
 |###|   |####.  '-'  .####|   |###|                                 
 |###|   |#####*.   .*#####|   |###|                                 
 |###|   '+#####|   |#####+'   |###|                                 
 |####+.     +##|   |#+'     .+####|                                 
 '#######+   |##|        .+########'                                 
    '+###|   |##|    .+########+'                                    
        '|   |####+########+'                                        
             +#########+'                                            
                '+v+'                                                

436767:M 27 Aug 2024 16:08:56.058 * Server initialized
436767:M 27 Aug 2024 16:08:56.058 * Loading RDB produced by valkey version 7.2.6
[...]
436767:M 27 Aug 2024 16:08:56.058 * Ready to accept connections tls

Now it is time to test the connection with a client using TLS:

$ valkey-cli --tls --cert ./tls/redis.crt --key ./tls/redis.key --cacert ./tls/ca.crt -p 6379

127.0.0.1:6379> INFO SERVER
# Server
server_name:valkey
valkey_version:7.2.6
[...]

Automated data migration to Valkey

Finally, we migrate the data in this example using a Python 3 script. This Python script establishes connections to both the Redis source and Valkey target databases, fetches all the keys from the Redis database and creates or updates each key-value pair in the Valkey database. This approach is not off the shelf and uses the redis-py library, which provides a list of examples. By using Python 3 the process could even be extended to filter unwanted data, alter values to be suitable for the new environment or by adding sanity checks. The script, which is used here, provides progress updates during the migration process:

#!/usr/bin/env python3

import redis

# Connect to the Redis source database, which is password protected, via IP and port
redis_client = redis.StrictRedis(host='172.17.0.3', port=6379, password='secret', db=0)

# Connect to the Valkey target database, which is using TLS
ssl_certfile="./tls/client.crt"
ssl_keyfile="./tls/client.key"
ssl_ca_certs="./tls/ca.crt"
valkey_client = redis.Redis(
    host="192.168.0.3",
    port=6379,
    ssl=True,
    ssl_certfile=ssl_certfile,
    ssl_keyfile=ssl_keyfile,
    ssl_cert_reqs="required",
    ssl_ca_certs=ssl_ca_certs,
)

# Fetch all keys from the Redis database
keys = redis_client.keys('*')
print("Found", len(keys), "Keys in Source!")

# Migrate each key-value pair to the Valkey database
for counter, key in enumerate(keys):
    value = redis_client.get(key)
    valkey_client.set(key, value)
    print("Status: ", round((counter+1) / len(keys) * 100, 1), "%", end='\r')
print()

To start the process execute the script:

$ python3 redis_to_tls_valkey.py
Found 569383 Keys in Source!
Status: 100.0 %

As a last step, configure your application to connect to the new Valkey server.

Conclusion

Since the change of Redis’ license, the new project Valkey is gaining more and more attraction. Migrating to Valkey ensures continued access to a robust, open-source in-memory database without the licensing restrictions of Redis. Whether you’re running your infrastructure on-premises or in the cloud, this guide provides the steps needed for a successful migration. Migrating from a cloud instance to a new environment can be cumbersome, because of no direct file access or isolated networks. Depending on these circumstances, we used a Python script, which is a flexible way to implement various steps to master the task.

If you find this guide helpful and in case you need support to migrate your databases, feel free to contact us. We like to support you on-premises or in cloud environments.

Mastering Cloud Infrastructure with Pulumi: Introduction

In today’s rapidly changing landscape of cloud computing, managing infrastructure as code (IaC) has become essential for developers and IT professionals. Pulumi, an open-source IaC tool, brings a fresh perspective to the table by enabling infrastructure management using popular programming languages like JavaScript, TypeScript, Python, Go, and C#. This approach offers a unique blend of flexibility and power, allowing developers to leverage their existing coding skills to build, deploy, and manage cloud infrastructure. In this post, we’ll explore the world of Pulumi and see how it pairs with Amazon FSx for NetApp ONTAP—a robust solution for scalable and efficient cloud storage.

Pulumi – The Theory

Why Pulumi?

Pulumi distinguishes itself among IaC tools for several compelling reasons:

Use Familiar Programming Languages: Unlike traditional IaC tools that rely on domain-specific languages (DSLs), Pulumi allows you to use familiar programming languages. This means no need to learn new syntax, and you can incorporate sophisticated logic, conditionals, and loops directly in your infrastructure code.
Seamless Integration with Development Workflows: Pulumi integrates effortlessly with existing development workflows and tools, making it a natural fit for modern software projects. Whether you’re managing a simple web app or a complex, multi-cloud architecture, Pulumi provides the flexibility to scale without sacrificing ease of use.

Challenges with Pulumi

Like any tool, Pulumi comes with its own set of challenges:

Learning Curve: While Pulumi leverages general-purpose languages, developers need to be proficient in the language they choose, such as Python or TypeScript. This can be a hurdle for those unfamiliar with these languages.
Growing Ecosystem: As a relatively new tool, Pulumi’s ecosystem is still expanding. It might not yet match the extensive plugin libraries of older IaC tools, but its vibrant and rapidly growing community is a promising sign of things to come.

State Management in Pulumi: Ensuring Consistency Across Deployments

Effective infrastructure management hinges on proper state handling. Pulumi excels in this area by tracking the state of your infrastructure, enabling it to manage resources efficiently. This capability ensures that Pulumi knows exactly what needs to be created, updated, or deleted during deployments. Pulumi offers several options for state storage:

Local State: Stored directly on your local file system. This option is ideal for individual projects or simple setups.
Remote State: By default, Pulumi stores state remotely on the Pulumi Service (a cloud-hosted platform provided by Pulumi), but it also allows you to configure storage on AWS S3, Azure Blob Storage, or Google Cloud Storage. This is particularly useful in team environments where collaboration is essential.

Managing state effectively is crucial for maintaining consistency across deployments, especially in scenarios where multiple team members are working on the same infrastructure.

Other IaC Tools: Comparing Pulumi to Traditional IaC Tools

When comparing Pulumi to other Infrastructure as Code (IaC) tools, several drawbacks of traditional approaches become evident:

Domain-Specific Language (DSL) Limitations: Many IaC tools depend on DSLs, such as Terraform’s HCL, requiring users to learn a specialized language specific to the tool.
YAML/JSON Constraints: Tools that rely on YAML or JSON can be both restrictive and verbose, complicating the management of more complex configurations.
Steep Learning Curve: The necessity to master DSLs or particular configuration formats adds to the learning curve, especially for newcomers to IaC.
Limited Logical Capabilities: DSLs often lack support for advanced logic constructs such as loops, conditionals, and reusability. This limitation can lead to repetitive code that is challenging to maintain.
Narrow Ecosystem: Some IaC tools have a smaller ecosystem, offering fewer plugins, modules, and community-driven resources.
Challenges with Code Reusability: The inability to reuse code across different projects or components can hinder efficiency and scalability in infrastructure management.
Testing Complexity: Testing infrastructure configurations written in DSLs can be challenging, making it difficult to ensure the reliability and robustness of the infrastructure code.

Pulumi – In Practice

Introduction

In the this section, we’ll dive into a practical example to better understand Pulumi’s capabilities. We’ll also explore how to set up a project using Pulumi with AWS and automate it using GitHub Actions for CI/CD.

Prerequisites

Before diving into using Pulumi with AWS and automating your infrastructure management through GitHub Actions, ensure you have the following prerequisites in place:

Pulumi CLI: Begin by installing the Pulumi CLI by following the official installation instructions. After installation, verify that Pulumi is correctly set up and accessible in your system’s PATH by running a quick version check.
AWS CLI: Install the AWS CLI, which is essential for interacting with AWS services. Configure the AWS CLI with your AWS credentials to ensure you have access to the necessary AWS resources. Ensure your AWS account is equipped with the required permissions, especially for IAM, EC2, S3, and any other AWS services you plan to manage with Pulumi.
AWS IAM User/Role for GitHub Actions: Create a dedicated IAM user or role in AWS specifically for use in your GitHub Actions workflows. This user or role should have permissions necessary to manage the resources in your Pulumi stack. Store the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY securely as secrets in your GitHub repository.
Pulumi Account: Set up a Pulumi account if you haven’t already. Generate a Pulumi access token and store it as a secret in your GitHub repository to facilitate secure automation.
Python and Pip: Install Python (version 3.7 or higher is recommended) along with Pip, which are necessary for Pulumi’s Python SDK. Once Python is installed, proceed to install Pulumi’s Python SDK along with any required AWS packages to enable infrastructure management through Python.
GitHub Account: Ensure you have an active GitHub account to host your code and manage your repository. Create a GitHub repository where you’ll store your Pulumi project and related automation workflows. Store critical secrets like AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and your Pulumi access token securely in the GitHub repository’s secrets section.
GitHub Runners: Utilize GitHub-hosted runners to execute your GitHub Actions workflows, or set up self-hosted runners if your project requires them. Confirm that the runners have all necessary tools installed, including Pulumi, AWS CLI, Python, and any other dependencies your Pulumi project might need.

Project Structure

When working with Infrastructure as Code (IaC) using Pulumi, maintaining an organized project structure is essential. A clear and well-defined directory structure not only streamlines the development process but also improves collaboration and deployment efficiency. In this post, we’ll explore a typical directory structure for a Pulumi project and explain the significance of each component.

Overview of a Typical Pulumi Project Directory

A standard Pulumi project might be organized as follows:


/project-root
├── .github
│ └── workflows
│ └── workflow.yml # GitHub Actions workflow for CI/CD
├── __main__.py # Entry point for the Pulumi program
├── infra.py # Infrastructure code
├── pulumi.dev.yml # Pulumi configuration for the development environment
├── pulumi.prod.yml # Pulumi configuration for the production environment
├── pulumi.yml # Pulumi configuration (common or default settings)
├── requirements.txt # Python dependencies
└── test_infra.py # Tests for infrastructure code

NetApp FSx on AWS

Introduction

Amazon FSx for NetApp ONTAP offers a fully managed, scalable storage solution built on the NetApp ONTAP file system. It provides high-performance, highly available shared storage that seamlessly integrates with your AWS environment. Leveraging the advanced data management capabilities of ONTAP, FSx for NetApp ONTAP is ideal for applications needing robust storage features and compatibility with existing NetApp systems.

Key Features

High Performance: FSx for ONTAP delivers low-latency storage designed to handle demanding, high-throughput workloads.
Scalability: Capable of scaling to support petabytes of storage, making it suitable for both small and large-scale applications.
Advanced Data Management: Leverages ONTAP’s comprehensive data management features, including snapshots, cloning, and disaster recovery.
Multi-Protocol Access: Supports NFS and SMB protocols, providing flexible access options for a variety of clients.
Cost-Effectiveness: Implements tiering policies to automatically move less frequently accessed data to lower-cost storage, helping optimize storage expenses.

What It’s About

Setting up Pulumi for managing your cloud infrastructure can revolutionize the way you deploy and maintain resources. By leveraging familiar programming languages, Pulumi brings Infrastructure as Code (IaC) to life, making the process more intuitive and efficient. When paired with Amazon FSx for NetApp ONTAP, it unlocks advanced storage solutions within the AWS ecosystem.

Putting It All Together

Using Pulumi, you can define and deploy a comprehensive AWS infrastructure setup, seamlessly integrating the powerful FSx for NetApp ONTAP file system. This combination simplifies cloud resource management and ensures you harness the full potential of NetApp’s advanced storage capabilities, making your cloud operations more efficient and robust.

In the next sections, we’ll walk through the specifics of setting up each component using Pulumi code, illustrating how to create a VPC, configure subnets, set up a security group, and deploy an FSx for NetApp ONTAP file system, all while leveraging the robust features provided by both Pulumi and AWS.

Architecture Overview

A visual representation of the architecture we’ll deploy using Pulumi: Single AZ Deployment with FSx and EC2

The diagram above illustrates the architecture for deploying an FSx for NetApp ONTAP file system within a single Availability Zone. The setup includes a VPC with public and private subnets, an Internet Gateway for outbound traffic, and a Security Group controlling access to the FSx file system and the EC2 instance. The EC2 instance is configured to mount the FSx volume using NFS, enabling seamless access to storage.

Setting up Pulumi

Follow these steps to set up Pulumi and integrate it with AWS:

Install Pulumi: Begin by installing Pulumi using the following command:

curl -fsSL https://get.pulumi.com | sh

Install AWS CLI: If you haven’t installed it yet, install the AWS CLI to manage AWS services:

pip install awscli

Configure AWS CLI: Configure the AWS CLI with your credentials:

aws configure

Create a New Pulumi Project: Initialize a new Pulumi project with AWS and Python:

pulumi new aws-python

Configure Your Pulumi Stack: Set the AWS region for your Pulumi stack:

pulumi config set aws:region eu-central-1

Deploy Your Stack: Deploy your infrastructure using Pulumi:

pulumi preview ; pulumi up

Example: VPC, Subnets, and FSx for NetApp ONTAP

Let’s dive into an example Pulumi project that sets up a Virtual Private Cloud (VPC), subnets, a security group, an Amazon FSx for NetApp ONTAP file system, and an EC2 instance.

Pulumi Code Example: VPC, Subnets, and FSx for NetApp ONTAP

The first step is to define all the parameters required to set up the infrastructure. You can use the following example to configure these parameters as specified in the pulumi.dev.yaml file.

This pulumi.dev.yaml file contains configuration settings for a Pulumi project. It specifies various parameters for the deployment environment, including the AWS region, availability zones, and key name. It also defines CIDR blocks for subnets. These settings are used to configure and deploy cloud infrastructure resources in the specified AWS region.


config:
  aws:region: eu-central-1
  demo:availabilityZone: eu-central-1a
  demo:keyName: XYZ
  demo:subnet1CIDER: 10.0.3.0/24
  demo:subnet2CIDER: 10.0.4.0/24

The following code snippet should be placed in the infra.py file. It details the setup of the VPC, subnets, security group, and FSx for NetApp ONTAP file system. Each step in the code is explained through inline comments.


import pulumi
import pulumi_aws as aws
import pulumi_command as command
import os

# Retrieve configuration values from Pulumi configuration files
aws_config = pulumi.Config("aws")
region = aws_config.require("region")  # The AWS region where resources will be deployed

demo_config = pulumi.Config("demo")
availability_zone = demo_config.require("availabilityZone")  # Availability Zone for the deployment
subnet1_cidr = demo_config.require("subnet1CIDER")  # CIDR block for the public subnet
subnet2_cidr = demo_config.require("subnet2CIDER")  # CIDR block for the private subnet
key_name = demo_config.require("keyName")  # Name of the SSH key pair for EC2 instance access# Create a new VPC with DNS support enabled
vpc = aws.ec2.Vpc(
    "fsxVpc",
    cidr_block="10.0.0.0/16",  # VPC CIDR block
    enable_dns_support=True,    # Enable DNS support in the VPC
    enable_dns_hostnames=True   # Enable DNS hostnames in the VPC
)

# Create an Internet Gateway to allow internet access from the VPC
internet_gateway = aws.ec2.InternetGateway(
    "vpcInternetGateway",
    vpc_id=vpc.id  # Attach the Internet Gateway to the VPC
)

# Create a public route table for routing internet traffic via the Internet Gateway
public_route_table = aws.ec2.RouteTable(
    "publicRouteTable",
    vpc_id=vpc.id,
    routes=[aws.ec2.RouteTableRouteArgs(
        cidr_block="0.0.0.0/0",  # Route all traffic (0.0.0.0/0) to the Internet Gateway
        gateway_id=internet_gateway.id
    )]
)

# Create a single public subnet in the specified Availability Zone
public_subnet = aws.ec2.Subnet(
    "publicSubnet",
    vpc_id=vpc.id,
    cidr_block=subnet1_cidr,  # CIDR block for the public subnet
    availability_zone=availability_zone,  # The specified Availability Zone
    map_public_ip_on_launch=True  # Assign public IPs to instances launched in this subnet
)

# Create a single private subnet in the same Availability Zone
private_subnet = aws.ec2.Subnet(
    "privateSubnet",
    vpc_id=vpc.id,
    cidr_block=subnet2_cidr,  # CIDR block for the private subnet
    availability_zone=availability_zone  # The same Availability Zone
)

# Associate the public subnet with the public route table to enable internet access
public_route_table_association = aws.ec2.RouteTableAssociation(
    "publicRouteTableAssociation",
    subnet_id=public_subnet.id,
    route_table_id=public_route_table.id
)

# Create a security group to control inbound and outbound traffic for the FSx file system
security_group = aws.ec2.SecurityGroup(
    "fsxSecurityGroup",
    vpc_id=vpc.id,
    description="Allow NFS traffic",  # Description of the security group
    ingress=[
        aws.ec2.SecurityGroupIngressArgs(
            protocol="tcp",
            from_port=2049,  # NFS protocol port
            to_port=2049,
            cidr_blocks=["0.0.0.0/0"]  # Allow NFS traffic from anywhere
        ),
        aws.ec2.SecurityGroupIngressArgs(
            protocol="tcp",
            from_port=111,  # RPCBind port for NFS
            to_port=111,
            cidr_blocks=["0.0.0.0/0"]  # Allow RPCBind traffic from anywhere
        ),
        aws.ec2.SecurityGroupIngressArgs(
            protocol="udp",
            from_port=111,  # RPCBind port for NFS over UDP
            to_port=111,
            cidr_blocks=["0.0.0.0/0"]  # Allow RPCBind traffic over UDP from anywhere
        ),
        aws.ec2.SecurityGroupIngressArgs(
            protocol="tcp",
            from_port=22,  # SSH port for EC2 instance access
            to_port=22,
            cidr_blocks=["0.0.0.0/0"]  # Allow SSH traffic from anywhere
        )
    ],
    egress=[
        aws.ec2.SecurityGroupEgressArgs(
            protocol="-1",  # Allow all outbound traffic
            from_port=0,
            to_port=0,
            cidr_blocks=["0.0.0.0/0"]  # Allow all outbound traffic to anywhere
        )
    ]
)

# Create the FSx for NetApp ONTAP file system in the private subnet
file_system = aws.fsx.OntapFileSystem(
    "fsxFileSystem",
    subnet_ids=[private_subnet.id],  # Deploy the FSx file system in the private subnet
    preferred_subnet_id=private_subnet.id,  # Preferred subnet for the FSx file system
    security_group_ids=[security_group.id],  # Attach the security group to the FSx file system
    deployment_type="SINGLE_AZ_1",  # Single Availability Zone deployment
    throughput_capacity=128,  # Throughput capacity in MB/s
    storage_capacity=1024  # Storage capacity in GB
)

# Create a Storage Virtual Machine (SVM) within the FSx file system
storage_virtual_machine = aws.fsx.OntapStorageVirtualMachine(
    "storageVirtualMachine",
    file_system_id=file_system.id,  # Associate the SVM with the FSx file system
    name="svm1",  # Name of the SVM
    root_volume_security_style="UNIX"  # Security style for the root volume
)

# Create a volume within the Storage Virtual Machine (SVM)
volume = aws.fsx.OntapVolume(
    "fsxVolume",
    storage_virtual_machine_id=storage_virtual_machine.id,  # Associate the volume with the SVM
    name="vol1",  # Name of the volume
    junction_path="/vol1",  # Junction path for mounting
    size_in_megabytes=10240,  # Size of the volume in MB
    storage_efficiency_enabled=True,  # Enable storage efficiency features
    tiering_policy=aws.fsx.OntapVolumeTieringPolicyArgs(
        name="SNAPSHOT_ONLY"  # Tiering policy for the volume
    ),
    security_style="UNIX"  # Security style for the volume
)

# Extract the DNS name from the list of SVM endpoints
dns_name = storage_virtual_machine.endpoints.apply(lambda e: e[0]['nfs'][0]['dns_name'])

# Get the latest Amazon Linux 2 AMI for the EC2 instance
ami = aws.ec2.get_ami(
    most_recent=True,
    owners=["amazon"],
    filters=[{"name": "name", "values": ["amzn2-ami-hvm-*-x86_64-gp2"]}]  # Filter for Amazon Linux 2 AMI
)

# Create an EC2 instance in the public subnet
ec2_instance = aws.ec2.Instance(
    "fsxEc2Instance",
    instance_type="t3.micro",  # Instance type for the EC2 instance
    vpc_security_group_ids=[security_group.id],  # Attach the security group to the EC2 instance
    subnet_id=public_subnet.id,  # Deploy the EC2 instance in the public subnet
    ami=ami.id,  # Use the latest Amazon Linux 2 AMI
    key_name=key_name,  # SSH key pair for accessing the EC2 instance
    tags={"Name": "FSx EC2 Instance"}  # Tag for the EC2 instance
)

# User data script to install NFS client and mount the FSx volume on the EC2 instance
user_data_script = dns_name.apply(lambda dns: f"""#!/bin/bash
sudo yum update -y
sudo yum install -y nfs-utils
sudo mkdir -p /mnt/fsx
if ! mountpoint -q /mnt/fsx; then
sudo mount -t nfs {dns}:/vol1 /mnt/fsx
fi
""")

# Retrieve the private key for SSH access from environment variables while running with Github Actions
private_key_content = os.getenv("PRIVATE_KEY")
print(private_key_content)

# Ensure the FSx file system is available before executing the script on the EC2 instance
pulumi.Output.all(file_system.id, ec2_instance.public_ip).apply(lambda args: command.remote.Command(
    "mountFsxFileSystem",
    connection=command.remote.ConnectionArgs(
        host=args[1],
        user="ec2-user",
        private_key=private_key_content
    ),
    create=user_data_script,
    opts=pulumi.ResourceOptions(depends_on=[volume])
))

Pytest with Pulumi

Introduction

Pytest is a widely-used Python testing framework that allows developers to create simple and scalable test cases. When paired with Pulumi, an infrastructure as code (IaC) tool, Pytest enables thorough testing of cloud infrastructure code, akin to application code testing. This combination is crucial for ensuring that infrastructure configurations are accurate, secure, and meet the required state before deployment. By using Pytest with Pulumi, you can validate resource properties, mock cloud provider responses, and simulate various scenarios. This reduces the risk of deploying faulty infrastructure and enhances the reliability of your cloud environments. Although integrating Pytest into your CI/CD pipeline is not mandatory, it is highly beneficial as it leverages Python’s robust testing capabilities with Pulumi.

Testing Code

The following code snippet should be placed in the infra_test.py file. It is designed to test the infrastructure setup defined in infra.py, including the VPC, subnets, security group, and FSx for NetApp ONTAP file system. Each test case focuses on different aspects of the infrastructure to ensure correctness, security, and that the desired state is achieved. Inline comments are provided to explain each test case.


# Importing necessary libraries
import pulumi
import pulumi_aws as aws
from typing import Any, Dict, List

# Setting up configuration values for AWS region and various parameters
pulumi.runtime.set_config('aws:region', 'eu-central-1')
pulumi.runtime.set_config('demo:availabilityZone1', 'eu-central-1a')
pulumi.runtime.set_config('demo:availabilityZone2', 'eu-central-1b')
pulumi.runtime.set_config('demo:subnet1CIDER', '10.0.3.0/24')
pulumi.runtime.set_config('demo:subnet2CIDER', '10.0.4.0/24')
pulumi.runtime.set_config('demo:keyName', 'XYZ') - Change based on your own key

# Creating a class MyMocks to mock Pulumi's resources for testing
class MyMocks(pulumi.runtime.Mocks):
    def new_resource(self, args: pulumi.runtime.MockResourceArgs) -> List[Any]:
        # Initialize outputs with the resource's inputs
        outputs = args.inputs

        # Mocking specific resources based on their type
        if args.typ == "aws:ec2/instance:Instance":
            # Mocking an EC2 instance with some default values
            outputs = {
                **args.inputs,  # Start with the given inputs
                "ami": "ami-0eb1f3cdeeb8eed2a",  # Mock AMI ID
                "availability_zone": "eu-central-1a",  # Mock availability zone
                "publicIp": "203.0.113.12",  # Mock public IP
                "publicDns": "ec2-203-0-113-12.compute-1.amazonaws.com",  # Mock public DNS
                "user_data": "mock user data script",  # Mock user data
                "tags": {"Name": "test"}  # Mock tags
            }
        elif args.typ == "aws:ec2/securityGroup:SecurityGroup":
            # Mocking a Security Group with default ingress rules
            outputs = {
                **args.inputs,
                "ingress": [
                    {"from_port": 80, "cidr_blocks": ["0.0.0.0/0"]},  # Allow HTTP traffic from anywhere
                    {"from_port": 22, "cidr_blocks": ["192.168.0.0/16"]}  # Allow SSH traffic from a specific CIDR block
                ]
            }
        
        # Returning a mocked resource ID and the output values
        return [args.name + '_id', outputs]

    def call(self, args: pulumi.runtime.MockCallArgs) -> Dict[str, Any]:
        # Mocking a call to get an AMI
        if args.token == "aws:ec2/getAmi:getAmi":
            return {
                "architecture": "x86_64",  # Mock architecture
                "id": "ami-0eb1f3cdeeb8eed2a",  # Mock AMI ID
            }
        
        # Return an empty dictionary if no specific mock is needed
        return {}

# Setting the custom mocks for Pulumi
pulumi.runtime.set_mocks(MyMocks())

# Import the infrastructure to be tested
import infra

# Define a test function to validate the AMI ID of the EC2 instance
@pulumi.runtime.test
def test_instance_ami():
    def check_ami(ami_id: str) -> None:
        print(f"AMI ID received: {ami_id}")
        # Assertion to ensure the AMI ID is the expected one
        assert ami_id == "ami-0eb1f3cdeeb8eed2a", 'EC2 instance must have the correct AMI ID'

    # Running the test to check the AMI ID
    pulumi.runtime.run_in_stack(lambda: infra.ec2_instance.ami.apply(check_ami))

# Define a test function to validate the availability zone of the EC2 instance
@pulumi.runtime.test
def test_instance_az():
    def check_az(availability_zone: str) -> None:
        print(f"Availability Zone received: {availability_zone}")
        # Assertion to ensure the instance is in the correct availability zone
        assert availability_zone == "eu-central-1a", 'EC2 instance must be in the correct availability zone'
    
    # Running the test to check the availability zone
    pulumi.runtime.run_in_stack(lambda: infra.ec2_instance.availability_zone.apply(check_az))

# Define a test function to validate the tags of the EC2 instance
@pulumi.runtime.test
def test_instance_tags():
    def check_tags(tags: Dict[str, Any]) -> None:
        print(f"Tags received: {tags}")
        # Assertions to ensure the instance has tags and a 'Name' tag
        assert tags, 'EC2 instance must have tags'
        assert 'Name' in tags, 'EC2 instance must have a Name tag'
    
    # Running the test to check the tags
    pulumi.runtime.run_in_stack(lambda: infra.ec2_instance.tags.apply(check_tags))

# Define a test function to validate the user data script of the EC2 instance
@pulumi.runtime.test
def test_instance_userdata():
    def check_user_data(user_data_script: str) -> None:
        print(f"User data received: {user_data_script}")
        # Assertion to ensure the instance has user data configured
        assert user_data_script is not None, 'EC2 instance must have user_data_script configured'
    
    # Running the test to check the user data script
    pulumi.runtime.run_in_stack(lambda: infra.ec2_instance.user_data.apply(check_user_data))

Github Actions

Introduction

GitHub Actions is a powerful automation tool integrated within GitHub, enabling developers to automate their workflows, including testing, building, and deploying code. Pulumi, on the other hand, is an Infrastructure as Code (IaC) tool that allows you to manage cloud resources using familiar programming languages. In this post, we’ll explore why you should use GitHub Actions and its specific purpose when combined with Pulumi.

Why Use GitHub Actions and Its Importance

GitHub Actions is a powerful tool for automating workflows within your GitHub repository, offering several key benefits, especially when combined with Pulumi:

Integrated CI/CD: GitHub Actions seamlessly integrates Continuous Integration and Continuous Deployment (CI/CD) directly into your GitHub repository. This automation enhances consistency in testing, building, and deploying code, reducing the risk of manual errors.
Custom Workflows: It allows you to create custom workflows for different stages of your software development lifecycle, such as code linting, running unit tests, or managing complex deployment processes. This flexibility ensures your automation aligns with your specific needs.
Event-Driven Automation: You can trigger GitHub Actions with events like pushes, pull requests, or issue creation. This event-driven approach ensures that tasks are automated precisely when needed, streamlining your workflow.
Reusable Code: GitHub Actions supports reusable “actions” that can be shared across multiple workflows or repositories. This promotes code reuse and maintains consistency in automation processes.
Built-in Marketplace: The GitHub Marketplace offers a wide range of pre-built actions from the community, making it easy to integrate third-party services or implement common tasks without writing custom code.
Enhanced Collaboration: By using GitHub’s pull request and review workflows, teams can discuss and approve changes before deployment. This process reduces risks and improves collaboration on infrastructure changes.
Automated Deployment: GitHub Actions automates the deployment of infrastructure code, using Pulumi to apply changes. This automation reduces the risk of manual errors and ensures a consistent deployment process.
Testing: Running tests before deploying with GitHub Actions helps confirm that your infrastructure code works correctly, catching potential issues early and ensuring stability.
Configuration Management: It manages and sets up necessary configurations for Pulumi and AWS, ensuring your environment is correctly configured for deployments.
Preview and Apply Changes: GitHub Actions allows you to preview changes before applying them, helping you understand the impact of modifications and minimizing the risk of unintended changes.
Cleanup: You can optionally destroy the stack after testing or deployment, helping control costs and maintain a clean environment.

Execution

To execute the GitHub Actions workflow:

Placement: Save the workflow YAML file in your repository’s .github/workflows directory. This setup ensures that GitHub Actions will automatically detect and execute the workflow whenever there’s a push to the main branch of your repository.
Workflow Actions: The workflow file performs several critical actions:
- Environment Setup: Configures the necessary environment for running the workflow.
- Dependency Installation: Installs the required dependencies, including Pulumi CLI and other Python packages.
- Testing: Runs your tests to verify that your infrastructure code functions as expected.
- Preview and Apply Changes: Uses Pulumi to preview and apply any changes to your infrastructure.
- Cleanup: Optionally destroys the stack after tests or deployment to manage costs and maintain a clean environment.

By incorporating this workflow, you ensure that your Pulumi infrastructure is continuously integrated and deployed with proper validation, significantly improving the reliability and efficiency of your infrastructure management process.

Example: Deploy infrastructure with Pulumi


name: Pulumi Deployment

on:
  push:
    branches:
      - main

env:
  # Environment variables for AWS credentials and private key.
  AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
  AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
  AWS_DEFAULT_REGION: ${{ secrets.AWS_DEFAULT_REGION }}
  PRIVATE_KEY: ${{ secrets.PRIVATE_KEY }}

jobs:
  pulumi-deploy:
    runs-on: ubuntu-latest
    environment: dev

    steps:
      - name: Checkout code
        uses: actions/checkout@v3
        # Check out the repository code to the runner.

      - name: Configure AWS credentials
        uses: aws-actions/configure-aws-credentials@v3
        with:
          aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
          aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
          aws-region: eu-central-1
        # Set up AWS credentials for use in subsequent actions.

      - name: Set up SSH key
        run: |
          mkdir -p ~/.ssh
          echo "${{ secrets.SSH_PRIVATE_KEY }}" > ~/.ssh/XYZ.pem
          chmod 600 ~/.ssh/XYZ.pem
        # Create an SSH directory, add the private SSH key, and set permissions.

      - name: Set up Python
        uses: actions/setup-python@v4
        with:
          python-version: '3.9'
        # Set up Python 3.9 environment for running Python-based tasks.
  
      - name: Set up Node.js
        uses: actions/setup-node@v3
        with:
          node-version: '14'
        # Set up Node.js 14 environment for running Node.js-based tasks.

      - name: Install project dependencies
        run: npm install
        working-directory: .
        # Install Node.js project dependencies specified in `package.json`.
      
      - name: Install Pulumi
        run: npm install -g pulumi
        # Install the Pulumi CLI globally.

      - name: Install dependencies
        run: |
          python -m pip install --upgrade pip
          pip install -r requirements.txt
        working-directory: .
        # Upgrade pip and install Python dependencies from `requirements.txt`.

      - name: Login to Pulumi
        run: pulumi login
        env:
          PULUMI_ACCESS_TOKEN: ${{ secrets.PULUMI_ACCESS_TOKEN }}
        # Log in to Pulumi using the access token stored in secrets.
        
      - name: Set Pulumi configuration for tests
        run: pulumi config set aws:region eu-central-1 --stack dev
        # Set Pulumi configuration to specify AWS region for the `dev` stack.

      - name: Pulumi stack select
        run: pulumi stack select dev  
        working-directory: .
        # Select the `dev` stack for Pulumi operations.

      - name: Run tests
        run: |
          pulumi config set aws:region eu-central-1
          pytest
        working-directory: .
        # Set AWS region configuration and run tests using pytest.
    
      - name: Preview Pulumi changes
        run: pulumi preview --stack dev
        working-directory: .
        # Preview the changes that Pulumi will apply to the `dev` stack.
    
      - name: Update Pulumi stack
        run: pulumi up --yes --stack dev
        working-directory: . 
        # Apply the changes to the `dev` stack with Pulumi.

      - name: Pulumi stack output
        run: pulumi stack output
        working-directory: .
        # Retrieve and display outputs from the Pulumi stack.

      - name: Cleanup Pulumi stack
        run: pulumi destroy --yes --stack dev
        working-directory: . 
        # Destroy the `dev` stack to clean up resources.

      - name: Pulumi stack output (after destroy)
        run: pulumi stack output
        working-directory: .
        # Retrieve and display outputs from the Pulumi stack after destruction.

      - name: Logout from Pulumi
        run: pulumi logout
        # Log out from the Pulumi session.

Output:

Finally, let’s take a look at how everything appears both in GitHub and in AWS. Check out the screenshots below to see the GitHub Actions workflow in action and the resulting AWS resources.

GitHub Actions workflow

AWS FSx resource

AWS FSx storage virtual machine

Outlook: Exploring Advanced Features of Pulumi and Amazon FSx for NetApp ONTAP

As you become more comfortable with Pulumi and Amazon FSx for NetApp ONTAP, there are numerous advanced features and capabilities to explore. These can significantly enhance your infrastructure automation and storage management strategies. In a follow-up blog post, we will delve into these advanced topics, providing a deeper understanding and practical examples.

Advanced Features of Pulumi

Cross-Cloud Infrastructure Management
Pulumi supports multiple cloud providers, including AWS, Azure, Google Cloud, and Kubernetes. In more advanced scenarios, you can manage resources across different clouds in a single Pulumi project, enabling true multi-cloud and hybrid cloud architectures.
Component Resources
Pulumi allows you to create reusable components by grouping related resources into custom classes. This is particularly useful for complex deployments where you want to encapsulate and reuse configurations across different projects or environments.
Automation API
Pulumi’s Automation API enables you to embed Pulumi within your own applications, allowing for infrastructure to be managed programmatically. This can be useful for building custom CI/CD pipelines or integrating with other systems.
Policy as Code with Pulumi CrossGuard
Pulumi CrossGuard allows you to enforce compliance and security policies across your infrastructure using familiar programming languages. Policies can be applied to ensure that resources adhere to organizational standards, improving governance and reducing risk.
Stack References and Dependency Management
Pulumi’s stack references enable you to manage dependencies between different Pulumi stacks, allowing for complex, interdependent infrastructure setups. This is crucial for large-scale environments where components must interact and be updated in a coordinated manner.

Advanced Features of Amazon FSx for NetApp ONTAP

Data Protection and Snapshots
FSx for NetApp ONTAP offers advanced data protection features, including automated snapshots, SnapMirror for disaster recovery, and integration with AWS Backup. These features help safeguard your data and ensure business continuity.
Data Tiering and Cost Optimization
FSx for ONTAP includes intelligent data tiering, which automatically moves infrequently accessed data to lower-cost storage. This feature is vital for optimizing costs, especially in environments with large amounts of data that have varying access patterns.
Multi-Protocol Access and CIFS/SMB Integration
FSx for ONTAP supports multiple protocols, including NFS, SMB, and iSCSI, enabling seamless access from both Linux/Unix and Windows clients. This is particularly useful in mixed environments where applications or users need to access the same data using different protocols.
Performance Tuning and Quality of Service (QoS)
FSx for ONTAP allows you to fine-tune performance parameters and implement QoS policies, ensuring that critical workloads receive the necessary resources. This is essential for applications with stringent performance requirements.
ONTAP System Manager and API Integration
Advanced users can leverage the ONTAP System Manager or integrate with NetApp’s extensive API offerings to automate and customize the management of FSx for ONTAP. This level of control is invaluable for organizations looking to tailor their storage solutions to specific needs.

What’s Next?

In the next blog post, we will explore these advanced features in detail, providing practical examples and use cases. We’ll dive into multi-cloud management with Pulumi, demonstrate the creation of reusable infrastructure components, and explore how to enforce security and compliance policies with Pulumi CrossGuard. Additionally, we’ll examine advanced data management strategies with FSx for NetApp ONTAP, including snapshots, data tiering, and performance optimization.

Stay tuned as we take your infrastructure as code and cloud storage management to the next level!

Conclusion

This example demonstrates how Pulumi can be used to manage AWS infrastructure using Python. By defining resources like VPCs, subnets, security groups, and FSx file systems in code, you can version control your infrastructure and easily reproduce environments.

Amazon FSx for NetApp ONTAP offers a powerful and flexible solution for running file-based workloads in the cloud, combining the strengths of AWS and NetApp ONTAP. Pulumi’s ability to leverage existing programming languages for infrastructure management allows for more complex logic and better integration with your development workflows. However, it requires familiarity with these languages and has a smaller ecosystem compared to Terraform. Despite these differences, Pulumi is a powerful tool for managing modern cloud infrastructure.

Disclaimer

The information provided in this blog post is for educational and informational purposes only. The features and capabilities of Pulumi and Amazon FSx for NetApp ONTAP mentioned are subject to change as new versions and updates are released. While we strive to ensure that the content is accurate and up-to-date, we cannot guarantee that it reflects the latest changes or improvements. Always refer to the official documentation and consult with your cloud provider or technology partner for the most current and relevant information. The author and publisher of this blog post are not responsible for any errors or omissions, or for any actions taken based on the information provided.

Veeam & Proxmox VE

Veeam has made a strategic move by integrating the open-source virtualization solution Proxmox VE (Virtual Environment) into its portfolio. Signaling its commitment into the evolving needs of the open-source community and the open-source virtualization market, this integration positions Veeam as a forward-thinking player in the industry, ready to support the rising tide of open-source solutions. The combination of Veeam’s data protection solutions with the flexibility of Proxmox VE’s platform offers enterprises a compelling alternative that promises cost savings and enhanced data security.

With the Proxmox VE, now also one of the most important and often requested open-source solution and hypervisor is being natively supported – and it could definitely make a turn in the virtualization market!

Opportunities for Open-Source Virtualization

In many enterprises, a major hypervisor platform is already in place, accompanied by a robust backup solution – often Veeam. However, until recently, Veeam lacked direct support for Proxmox VE, leaving a gap for those who have embraced or are considering this open-source virtualization platform. The latest version of Veeam changes the game by introducing the capability to create and manage backups and restores directly within Proxmox VE environments, without the need for agents inside the VMs.

This advancement means that entire VMs can now be backed up and restored across any hypervisor, providing unparalleled flexibility. Moreover, enterprises can seamlessly integrate a new Proxmox VE-based cluster into their existing Veeam setup, managing everything from a single, central point. This integration simplifies operations, reduces complexity, and enhances the overall efficiency of data protection strategies in environments that include multiple hypervisors by simply having a one-fits-all solution in place.

Also, an heavily underestimated benefit, offers the possibilities to easily migrate, copy, backup and restore entire VMs even independent of their underlying hypervisor – also known as cross platform recovery. As a result, operators are now able to shift VMs from VMware ESXi nodes / vSphere, or Hyper-V to Proxmox VE nodes. This provides a great solution to introduce and evaluate a new virtualization platform without taking any risks. For organizations looking to unify their virtualization and backup infrastructure, this update offers a significant leap forward.

Integration into Veeam

Integrating a new Proxmox cluster into an existing Veeam setup is a testament to the simplicity and user-centric design of both systems. Those familiar with Veeam will find the process to be intuitive and minimally disruptive, allowing for a seamless extension of their virtualization environment. This ease of integration means that your new Proxmox VE cluster can be swiftly brought under the protective umbrella of Veeam’s robust backup and replication services.

Despite the general ease of the process, it’s important to recognize that unique configurations and specific environments may present their own set of challenges. These corner cases, while not common, are worth noting as they can require special attention to ensure a smooth integration. Rest assured, however, that these are merely nuances in an otherwise straightforward procedure, and with a little extra care, even these can be managed effectively.

Overview

Starting with version 12.2, the Proxmox VE support is enabled and integrated by a plugin which gets installed on the Veeam Backup server. Veeam Backup for Proxmox incorporates a distributed architecture that necessitates the deployment of worker nodes. These nodes function analogously to data movers, facilitating the transfer of virtual machine payloads from the Proxmox VE hosts to the designated Backup Repository. The workers operate on a Linux platform and are seamlessly instantiated via the Veeam Backup Server console. Their role is critical and akin to that of proxy components in analogous systems such as AHV or VMware backup solutions.

Such a worker is needed at least once in a cluster. For improved performance, one worker for each Proxmox VE node might be considered. Each worker requires 6 vCPU, 6 GB memory and 100 GB disk space which should be kept in mind.

Requirements

This blog post assumes that an already present installation of Veeam Backup & Replication in version 12.2 or later is already in place and fully configured for another environment such like VMware. It also assumes that the Proxmox VE cluster is already present and a credential with the needed roles to perform the backup/restore actions is given.

Configuration

The integration and configuration of a Proxmox VE cluster can be fully done within the Veeam Backup & Replication Console application and does not require any additional commands on any cli to be executed. The previously mentioned worker nodes can be installed fully automated.

Adding a Proxmox Server

To integrate a new Proxmox Server into the Veeam Backup & Replication environment, one must initiate the process by accessing the Veeam console. Subsequently, navigate through the designated sections to complete the addition:

Virtual Infrastructure -> Add Server

This procedure is consistent with the established protocol for incorporating nodes from other virtualization platforms that are compatible with Veeam.

Afterwards, Veeam shows you a selection of possible and supported Hypervisors:

VM vSphere
Microsoft Hyper-V
Nutanix AHV
RedHat Virtualization
Oracle Virtualization Manager
Proxmox VE

In this case we simply choose Proxmox VE and proceed the setup wizard.

During the next steps in the setup wizard, the authentication details, the hostname or IP address of the target Proxmox VE server and also a snapshot storage of the Proxmox VE server must be defined.

Hint: When it comes to the authentication details, take care to use functional credentials for the SSH service on the Proxmox VE server. If you usually use the root@pam credentials for the web interface, you simply need to prompt root to Veeam. Veeam will initiate a connection to the system over the ssh protocol.

In one of the last surveys of the setup wizard, Veeam offers to automatically install the required worker node. Such a worker node is a small sized VM that is running inside the cluster on the targeted Proxmox VE server. In general, a single worker node for a cluster in enough but to enhance the overall performance, one worker for each node is recommended.

Usage

Once the Proxmox VE server has been successfully integrated into the Veeam inventory, it can be managed as effortlessly as any other supported hypervisor, such as VMware vSphere or Microsoft Hyper-V. A significant advantage, as shown in the screenshot, is the capability to centrally administrate various hypervisors and servers in clusters. This eliminates the necessity for a separate Veeam instance for each cluster, streamlining operations. Nonetheless, there may be specific scenarios where individual setups for each cluster are preferable.

As a result, this does not only simplify the operator’s work when working with different servers and clusters but also provides finally the opportunity for cross-hypervisor-recoveries.

Creating Backup Jobs

Creating a new backup job for a single VM or even multiple VMs in a Proxmox environment is as simple and exactly the same way, like you already know for other hypervisors. However, let us have a quick summary about the needed tasks:

Open the Veeam Backup & Replication console on your backup server or management workstation. To start creating a backup job, navigate to the Home tab and click on Backup Job, then select Virtual machine from the drop-down menu.

When the New Backup Job wizard opens, you will need to enter a name and a description for the backup job. Click Next to proceed to the next step. Now, you will need to select the VMs that you want to back up. Click Add in the Virtual Machines step and choose the individual VMs or containers like folders, clusters, or entire hosts that you want to include in the backup. Once you have made your selection, click Next.

The next step is to specify where you want to store the backup files. In the Storage step, select the backup repository and decide on the retention policy that dictates how long you want to keep the backup data. After setting this up, click Next.

If you have configured multiple backup proxies, the next step allows you to specify which one to use. If you are not sure or if you prefer, you can let Veeam Backup & Replication automatically select the best proxy for the job. Click Next after making your choice.

Now it is time to schedule when the backup job should run. In the Schedule step, you can set up the job to run automatically at specific times or in response to certain events. After configuring the schedule, click Next.

Review all the settings on the summary page to ensure they are correct. If everything looks good, click Finish to create the backup job.

If you want to run the backup job immediately for ensuring everything works as expected, you can do so by right-clicking on the job and selecting Start. Alternatively, you can wait for the scheduled time to trigger the job automatically.

Restoring an entire VM

The restore and replication process for a full VM restore remains to the standard procedures. However, it now includes the significant feature of cross-hypervisor restore. This functionality allows for the migration of VMs between different hypervisor types without compatibility issues. For example, when introducing Proxmox VE into a corporate setting, operators can effortlessly migrate VMs from an existing hypervisor to the Proxmox VE cluster. Should any issues arise during the testing phase, the process also supports the reverse migration back to the original hypervisor. Let us have a look at the details.

Choose the Entire VM restore option, which will launch the wizard for restoring a full virtual machine. The first step in the wizard will ask you to select a backup from which you want to restore. You will see a list of available backups; select the one that contains the VM you wish to restore and proceed to the next step by clicking Next.

Now, you must decide on the restore point. Typically, this will be the most recent backup, but you may choose an earlier point if necessary. After selecting the restore point, continue to the next step.

The wizard will then prompt you to specify the destination for the VM. This is the very handy point for cross-hypervisor-restore where this could be the original location or a new location if you are performing a migration or don’t want to overwrite the existing VM. Configure the network settings as required, ensuring that the restored VM will have the appropriate network access.

In the next step, you will have options regarding the power state of the VM after the restoration. You can choose to power on the VM automatically or leave it turned off, depending on your needs.

Before finalizing the restore process, review all the settings to make sure they align with your intended outcome. This is your chance to go back and make any necessary adjustments. Once you’re satisfied with the configuration, proceed to restore the VM by clicking Finish.

The restoration process will begin, and its progress can be monitored within the Veeam Backup & Replication console. Depending on the size of the VM and the performance of your backup storage and network, the restoration can take some time.

File-Level-Restore

Select Restore guest files. The wizard for file-level recovery will start, guiding you through the necessary steps. The first step involves choosing the VM backup from which you want to restore files. Browse through the list of available backups, select the appropriate one, and then click Next to proceed.

Choose the restore point that you want to use for the file-level restore. This is typically the most recent backup, but you can select an earlier one if needed. After picking the restore point, click Next to continue.

At this stage, you may need to choose the operating system of the VM that you are restoring files from. This is particularly important if the backup is of a different OS than the one on the Veeam Backup & Replication server because it will determine the type of helper appliance required for the restore.

Veeam Backup & Replication will prompt you to deploy a helper appliance if the backup is from an OS that is not natively supported by the Windows-based Veeam Backup & Replication server. Follow the on-screen instructions to deploy the helper appliance, which will facilitate the file-level restore process.

Once the helper appliance is ready, you will be able to browse the file system of the backup. Navigate through the backup to locate the files or folders you wish to restore.

After selecting the files or folders for restoration, you will be prompted to choose the destination where you want to restore the data. You can restore to the original location or specify a new location, depending on your requirements.

Review your selections to confirm that the correct files are being restored and to the right destination. If everything is in order, proceed with the restoration by clicking Finish.

The file-level restore process will start, and you can monitor the progress within the Veeam Backup & Replication console. The time it takes to complete the restore will depend on the size and number of files being restored, as well as the performance of your backup storage and network.

Conclusion

Summarising all the things, the latest update to Veeam introduces a very important and welcomed integration with Proxmox VE, filling a significant gap for enterprises that have adopted this open-source virtualization platform. By enabling direct backups and restores of entire VMs across different hypervisors without the need for in-VM agents, Veeam now offers unparalleled flexibility and simplicity in managing mixed environments. This advancement not only streamlines operations and enhances data protection strategies but also empowers organizations to easily migrate and evaluate new open-source virtualization platforms like Proxmox VE with minimal risk. It is great to see that more and more companies are putting efforts into supporting open-source solutions which underlines the ongoing importance of open-source based products in enterprises.

Additionally, for those starting fresh with Proxmox, the Proxmox Backup Server remains a viable open-source alternative and you can find our blog post about configuring the Proxmox Backup Server right here. Overall, this update represents a significant step forward in unifying virtualization and backup infrastructures, offering both versatility and ease of integration.

We are always here to help and assist you with further consulting, planning, and integration needs. Whether you are exploring new virtualization platforms, optimizing your current infrastructure, or looking for expert guidance on your backup strategies, our team is dedicated to ensuring your success every step of the way. Do not hesitate to reach out to us for personalized support and tailored solutions to meet your unique requirements in virtualization- or backup environments.

In the world of virtualization, ensuring data redundancy and high availability is crucial. Proxmox Virtual Environment (PVE) is a powerful open-source platform for enterprise virtualization, combining KVM hypervisor and LXC containers. One of the key features that Proxmox offers is local storage replication, which helps in maintaining data integrity and availability in case of hardware failures. In this blog post, we will delve into the concept of local storage replication in Proxmox, its benefits, and how to set it up.

What is Local Storage Replication?

Local storage replication in Proxmox refers to the process of duplicating data from one local storage device to another within the same Proxmox cluster. This ensures that if one storage device fails, the data is still available on another device, thereby minimizing downtime and data loss. This is particularly useful in environments where high availability is critical.

Benefits

Data Redundancy: By replicating data across multiple storage devices, you ensure that a copy of your data is always available, even if one device fails.
High Availability: In the event of hardware failure, the system can quickly switch to the replicated data, ensuring minimal disruption to services.

Caveat

Please note that data loss may occur between the last synchronization of the data and the failure of the node. Otherwise use shared storage (Ceph, NFS, …) in a cluster if you can not tolerate any small data loss.

Setting Up Local Storage Replication in Proxmox

Setting up local storage replication in Proxmox involves a few steps. Here’s a step-by-step guide to help you get started:

Step 1: Prepare Your Environment

Ensure that you have a Proxmox cluster set up with at least two nodes. Each node should have local ZFS storage configured.

Step 2: Configure Storage Replication

Access the Proxmox Web Interface: Log in to the Proxmox web interface.
Navigate to Datacenter: In the left-hand menu, click on Datacenter.
Select Storage: Under the Datacenter menu, click on Storage.
Add Storage: Click on Add and select the type of storage you want to replicate.
Configure Storage: Fill in the required details for the ZFS storage (one local storage per node).

Step 3: Set Up Replication

Navigate to the Node: In the left-hand menu, select the node where you want to set up replication.
Select the VM/CT: Click on the virtual machine (VM) or container (CT) you want to replicate.
Configure Replication: Go to the Replication tab and click on Add.
Select Target Node: Choose the target node where the data will be replicated to.
Schedule Replication: Set the replication schedule according to your needs (e.g. every 5 minutes, hourly).

Step 4: Monitor Replication

Once replication is set up, you can monitor its status in the Replication tab. Proxmox provides detailed logs and status updates to help you ensure that replication is functioning correctly.

Best Practices for Local Storage Replication

Regular Backups: While replication provides redundancy, it is not a substitute for regular backups. Ensure that you have a robust backup strategy in place. Use tools like the Proxmox Backup Server (PBS) for this task.
Monitor Storage Health: Regularly check the health of your storage devices to preemptively address any issues.
Test Failover: Periodically test the failover process to ensure that your replication setup works as expected in case of an actual failure.
Optimize Replication Schedule: Balance the replication frequency with your performance requirements and network bandwidth to avoid unnecessary load.

Conclusion

Local storage replication in Proxmox is a powerful feature that enhances data redundancy and high availability. By following the steps outlined in this blog post, you can set up and manage local storage replication in your Proxmox environment, ensuring that your data remains safe and accessible even in the face of hardware failures. Remember to follow best practices and regularly monitor your replication setup to maintain optimal performance and reliability.

You can find further information here about the Proxmox storage replication:

https://pve.proxmox.com/wiki/Storage_Replication
https://pve.proxmox.com/pve-docs/chapter-pvesr.html

Happy virtualizing!

With version 256, systemd introduced run0. Lennart Poettering describes run0 as an alternative to sudo and explains on Mastodon at the same time what he sees as the problem with sudo.

In this blog post, however, we do not want to go into the strengths or weaknesses of sudo, but take a closer look at run0 and use it as a sudo alternative.

Unlike sudo, run0 uses neither the configuration file /etc/sudoers nor a SUID bit to extend user permissions. In the background, it uses systemd-run to start new processes, which has been in systemd for several years.

PolKit is used when it comes to checking whether a user has the appropriate permissions to use run0. All rules that the configuration of PolKit provides can be used here. In our example, we will concentrate on a simple variant.

Experimental Setup

For our example, we use an t2.micro EC2 instance with Debian Bookworm. Since run0 was only introduced in systemd version 256 and Debian Bookworm is still delivered with version 252 at the current time, we must first add the Debian Testing Repository.

❯ ssh admin@2a05:d014:ac8:7e00:c4f4:af36:3938:206e
…

admin@ip-172-31-15-135:~$ sudo su -

root@ip-172-31-15-135:~# cat  < 
/etc/apt/sources.list.d/testing.list
>  deb https://deb.debian.org/debian testing main
> EOF

root@ip-172-31-15-135:~# apt update
Get:1 file:/etc/apt/mirrors/debian.list Mirrorlist [38 B]
Get:5 file:/etc/apt/mirrors/debian-security.list Mirrorlist [47 B]
Get:7 https://deb.debian.org/debian testing InRelease [169 kB]
Get:2 https://cdn-aws.deb.debian.org/debian bookworm InRelease [151 kB]
…
Fetched 41.3 MB in 6s (6791 kB/s)
Reading package lists... Done 
Building dependency tree... Done 
Reading state information... Done 
299 packages can be upgraded. Run 'apt list --upgradable' to see them. 

root@ip-172-31-15-135:~# apt-cache policy systemd
systemd:
Installed: 252.17-1~deb12u1
Candidate: 256.1-2
Version table:
256.1-2 500
500 https://deb.debian.org/debian testing/main amd64 Packages
254.5-1~bpo12+3 100
100 mirror+file:/etc/apt/mirrors/debian.list bookworm-backports/main amd64 Packages
252.22-1~deb12u1 500
500 mirror+file:/etc/apt/mirrors/debian.list bookworm/main amd64 Packages
*** 252.17-1~deb12u1 100
100 /var/lib/dpkg/status
root@ip-172-31-15-135:~# apt-get install systemd
…

root@ip-172-31-15-135:~# dpkg -l | grep systemd
ii libnss-resolve:amd64 256.1-2 amd64 nss module to resolve names via systemd-resolved
ii libpam-systemd:amd64 256.1-2 amd64 system and service manager - PAM module
ii libsystemd-shared:amd64 256.1-2 amd64 systemd shared private library
ii libsystemd0:amd64 256.1-2 amd64 systemd utility library
ii systemd 256.1-2 amd64 system and service manager
ii systemd-cryptsetup 256.1-2 amd64 Provides cryptsetup, integritysetup and veritysetup utilities
ii systemd-resolved 256.1-2 amd64 systemd DNS resolver
ii systemd-sysv 256.1-2 amd64 system and service manager - SysV compatibility symlinks
ii systemd-timesyncd 256.1-2 amd64 minimalistic service to synchronize local time with NTP servers

root@ip-172-31-15-135:~# reboot
…

The user admin is used for the initial login. This user has already been stored in the file /etc/sudoers.d/90-cloud-init-users by cloud-init and can therefore execute any sudo commands without being prompted for a password.

sudo cat /etc/sudoers.d/90-cloud-init-users
# Created by cloud-init v. 22.4.2 on Thu, 27 Jun 2024 09:22:48 +0000

# User rules for admin
admin ALL=(ALL) NOPASSWD:ALL

Analogous to sudo, we now want to enable run0 for the user admin.

Without further configuration, the user admin receives a login prompt asking for the root password. This is the default behavior of PolKit.

admin@ip-172-31-15-135:~$  run0  ==== AUTHENTICATING FOR org.freedesktop.systemd1.manage-units ==== Authentication is required to manage system services or other units. Authenticating as: Debian (admin) Password:

Since this does not correspond to the behavior we want, we have to help a little in the form of a PolKit rule. Additional PolKit rules are stored under /etc/polkit-1/rules.d/.

root@ip-172-31-15-135:~# cat  < 
/etc/polkit-1/rules.d/99-run0.rules
polkit.addRule(function(action, subject) {
  if (action.id = "org.freedesktop.systemd1.manage-units") {
  if (subject.user === "admin") {
  return polkit.Result.YES;
  }
  }
});
> EOF

The rule used is structured as follows: First, it is checked whether the action listed is org.freedesktop.systemd1.manage-units. If this is the case, it is checked whether the executing user is the user . If both requirements are met, our rule returns “YES”, which means that no further checks (e.g. password query) are necessary.

Alternatively, it could also be checked whether the executing user belongs to a specific group, such as admin or sudo (if (subject.isInGroup("admin")). It would also be conceivable to ask the user for their own password instead of the root password.

The new rule is automatically read in by PolKit and can be used immediately. Via it can be checked whether there were any errors when reading in the new rules. After the configuration of PolKit, the user admin can now execute run0 analogously to our initial sudo configuration.

Process Structure

The following listing shows the difference in the call stack between sudo and run0 While in the case of sudo, separate child processes are started, run0 starts a new process via systemd-run.

root@ip-172-31-15-135:~# sudo su -
root@ip-172-31-15-135:~# ps fo tty,ruser,ppid,pid,sess,cmd
TT RUSER PPID PID SESS CMD
pts/2 admin 1484 1514 1484 sudo su -
pts/0 admin 1514 1515 1515 \_ sudo su -
pts/0 root 1515 1516 1515 \_ su -
pts/0 root 1516 1517 1515 \_ -bash
pts/0 root 1517 1522 1515 \_ ps fo tty,ruser,ppid,pid,sess,cmd

admin@ip-172-31-15-135:~$ run0
root@ip-172-31-15-135:/home/admin# ps fo tty,ruser,ppid,pid,sess,cmd
TT RUSER PPID PID SESS CMD
pts/0 root 1 1562 1562 -/bin/bash
pts/0 root 1562 1567 1562 \_ ps fo tty,ruser,ppid,pid,sess,cmd

Conclusion and Note

As the example above has shown, run0 can generally be used as a simple sudo alternative and offers some security-relevant advantages. If run0 prevails over sudo, this will not happen within the next year. Some distributions simply lack a sufficiently up-to-date systemd version. In addition, the configuration of PolKit is not one of the daily tasks for some admins and know-how must first be built up here in order to transfer any existing sudo “constructs”.

In addition, a decisive advantage of run0 should not be ignored: By default, it colors the background red! 😉

If you had the choice, would you rather take Salsa or Guacamole? Let me explain, why you should choose Guacamole over Salsa.

In this blog article, we want to take a look at one of the smaller Apache projects out there called Apache Guacamole. Apache Guacamole allows administrators to run a web based client tool for accessing remote applications and servers. This can include remote desktop systems, applications or terminal sessions. Users can simply access them by using their web browsers. No special client or other tools are required. From there, they can login and access all pre-configured remote connections that have been specified by an administrator.

Thereby, Guacamole supports a wide variety of protocols like VNC, RDP, and SSH. This way, users can basically access anything from remote terminal sessions to full fledged Graphical User Interfaces provided by operation systems like Debian, Ubuntu, Windows and many more.

Convert every window application to a web application

If we spin this idea further, technically every window application that isn’t designed to run as an web application can be transformed to a web application by using Apache Guacamole. We helped a customer to bring its legacy application to Kubernetes, so that other users could use their web browsers to run it. Sure, implementing the application from ground up, so that it follows the Cloud Native principles, is the preferred solution. As always though, efforts, experience and costs may exceed the available time and budget and in that cases, Apache Guacamole can provide a relatively easy way for realizing such projects.

In this blog article, I want to show you, how easy it is to run a legacy window application as a web app on Kubernetes. For this, we will use a Kubernetes cluster created by kind and create a Kubernetes Deployment to make kate – a KDE based text editor – our own web application. It’s just an example, so there might be better application to transform but this one should be fine to show you the concepts behind Apache Guacamole.

So, without further ado, let’s create our kate web application.

Preparation of Kubernetes

Before we can start, we must make sure that we have a Kubernetes cluster, that we can test on. If you already have a cluster, simply skip this section. If not, let’s spin one up by using kind.

kind is a lightweight implementation of Kubernetes that can be run on every machine. It’s written in Go and can be installed like this:

# For AMD64 / x86_64
[ $(uname -m) = x86_64 ] && curl -Lo ./kind https://kind.sigs.k8s.io/dl/v0.22.0/kind-linux-amd64
# For ARM64
[ $(uname -m) = aarch64 ] && curl -Lo ./kind https://kind.sigs.k8s.io/dl/v0.22.0/kind-linux-arm64
chmod +x ./kind
sudo mv ./kind /usr/local/bin/kind

Next, we need to install some dependencies for our cluster. This includes for example docker and kubectl.

$ sudo apt install docker.io kubernetes-client

By creating our Kubernetes Cluster with kind, we need docker because the Kubernetes cluster is running within Docker containers on your host machine. Installing kubectl allows us to access the Kubernetes after creating it.

Once we installed those packages, we can start to create our cluster now. First, we must define a cluster configuration. It defines which ports are accessible from our host machine, so that we can access our Guacamole application. Remember, the cluster itself is operated within Docker containers, so we must ensure that we can access it from our machine. For this, we define the following configuration which we save in a file called cluster.yaml:

kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
  extraPortMappings:
  - containerPort: 30000
    hostPort: 30000
    listenAddress: "127.0.0.1"
    protocol: TCP

Hereby, we basically map the container’s port 30000 to our local machine’s port 30000, so that we can easily access it later on. Keep this in mind because it will be the port that we will use with our web browser to access our kate instance.

Ultimately, this configuration is consumed by kind . With it, you can also adjust multiple other parameters of your cluster besides of just modifying the port configuration which are not mentioned here. It’s worth to take a look kate’s documentation for this.

As soon as you saved the configuration to cluster.yaml, we can now start to create our cluster:

$ sudo kind create cluster --name guacamole --config cluster.yaml
Creating cluster "guacamole" ...
 ✓ Ensuring node image (kindest/node:v1.29.2) 🖼
 ✓ Preparing nodes 📦
 ✓ Writing configuration 📜
 ✓ Starting control-plane 🕹️
 ✓ Installing CNI 🔌
 ✓ Installing StorageClass 💾
Set kubectl context to "kind-guacamole"
You can now use your cluster with:

kubectl cluster-info --context kind-guacamole

Have a question, bug, or feature request? Let us know! https://kind.sigs.k8s.io/#community 🙂

Since we don’t want to run everything in root context, let’s export the kubeconfig, so that we can use it with kubectl by using our unpriviledged user:

$ sudo kind export kubeconfig \
    --name guacamole \
    --kubeconfig $PWD/config

$ export KUBECONFIG=$PWD/config
$ sudo chown $(logname): $KUBECONFIG

By doing so, we are ready and can access our Kubernetes cluster using kubectl now. This is our baseline to start migrating our application.

Creation of the Guacamole Deployment

In order to run our application on Kubernetes, we need some sort of workload resource. Typically, you could create a Pod, Deployment, Statefulset or Daemonset to run workloads on a cluster.

Let’s create the Kubernetes Deployment for our own application. The example shown below shows the deployment’s general structure. Each container definition will have their dedicated examples afterwards to explain them in more detail.

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: web-based-kate
  name: web-based-kate
spec:
  replicas: 1
  selector:
    matchLabels:
      app: web-based-kate
  template:
    metadata:
      labels:
        app: web-based-kate
    spec:
      containers:
      # The guacamole server component that each
      # user will connect to via their browser
      - name: guacamole-server
        image: docker.io/guacamole/guacamole:1.5.4
        ...
      # The daemon that opens the connection to the
      # remote entity
      - name: guacamole-guacd
        image: docker.io/guacamole/guacd:1.5.4
        ...
      # Our own self written application that we
      # want to make accessible via the web.
      - name: web-based-kate
        image: registry.example.com/own-app/web-based-kate:0.0.1
        ...
      volumes:
        - name: guacamole-config
          secret:
            secretName: guacamole-config
        - name: guacamole-server
          emptyDir: {}
        - name: web-based-kate-home
          emptyDir: {}
        - name: web-based-kate-tmp
          emptyDir: {}

As you can see, we need three containers and some volumes for our application. The first two containers are dedicated to Apache Guacamole itself. First, it’s the server component which is the external endpoint for clients to access our web application. It serves the web server as well as the user management and configuration to run Apache Guacamole.

Next to this, there is the guacd daemon. This is the core component of Guacamole which creates the remote connections to the application based on the configuration done to the server. This daemon forwards the remote connection to the clients by making it accessible to the Guacamole server which then forwards the connection to the end user.

Finally, we have our own application. It will offer a connection endpoint to the guacd daemon using one of Guacamole’s supported protocols and provide the Graphical User Interface (GUI).

Guacamole Server

Now, let’s deep dive into each container specification. We are starting with the Guacamole server instance. This one handles the session and user management and contains the configuration which defines what remote connections are available and what are not.

- name: guacamole-server
  image: docker.io/guacamole/guacamole:1.5.4
  env:
    - name: GUACD_HOSTNAME
      value: "localhost"
    - name: GUACD_PORT
      value: "4822"
    - name: GUACAMOLE_HOME
      value: "/data/guacamole/settings"
    - name: HOME
      value: "/data/guacamole"
    - name: WEBAPP_CONTEXT
      value: ROOT
  volumeMounts:
    - name: guacamole-config
      mountPath: /data/guacamole/settings
    - name: guacamole-server
      mountPath: /data/guacamole
  ports:
    - name: http
      containerPort: 8080
  securityContext:
    allowPrivilegeEscalation: false
    privileged: false
    readOnlyRootFilesystem: true
    capabilities:
      drop: ["all"]
  resources:
    limits:
      cpu: "250m"
      memory: "256Mi"
    requests:
      cpu: "250m"
      memory: "256Mi"

Since it needs to connect to the guacd daemon, we have to provide the connection information for guacd by passing them into the container using environment variables like GUACD_HOSTNAME or GUACD_PORT. In addition, Guacamole would usually be accessible via http://<your domain>/guacamole.

This behavior however can be adjusted by modifying the WEBAPP_CONTEXT environment variable. In our case for example, we don’t want a user to type in /guacamole to access it but simply using it like this http://<your domain>/

Guacamole Guacd

Then, there is the guacd daemon.

- name: guacamole-guacd
  image: docker.io/guacamole/guacd:1.5.4
  args:
    - /bin/sh
    - -c
    - /opt/guacamole/sbin/guacd -b 127.0.0.1 -L $GUACD_LOG_LEVEL -f
  securityContext:
    allowPrivilegeEscalation: true
    privileged: false
    readOnlyRootFileSystem: true
    capabilities:
      drop: ["all"]
  resources:
    limits:
      cpu: "250m"
      memory: "512Mi"
    requests:
      cpu: "250m"
      memory: "512Mi"

It’s worth mentioning that you should modify the arguments used to start the guacd container. In the example above, we want guacd to only listen to localhost for security reasons. All containers within the same pod share the same network namespace. As a a result, they can access each other via localhost. This said, there is no need to make this service accessible to over services running outside of this pod, so we can limit it to localhost only. To achieve this, you would need to set the -b 127.0.0.1 parameter which sets the corresponding listen address. Since you need to overwrite the whole command, don’t forget to also specify the -L and -f parameter. The first parameter sets the log level and the second one set the process in the foreground.

Web Based Kate

To finish everything off, we have the kate application which we want to transform to a web application.

- name: web-based-kate
  image: registry.example.com/own-app/web-based-kate:0.0.1
  env:
    - name: VNC_SERVER_PORT
      value: "5900"
    - name: VNC_RESOLUTION_WIDTH
      value: "1280"
    - name: VNC_RESOLUTION_HEIGHT
      value: "720"
  securityContext:
    allowPrivilegeEscalation: true
    privileged: false
    readOnlyRootFileSystem: true
    capabilities:
      drop: ["all"]
  volumeMounts:
    - name: web-based-kate-home
      mountPath: /home/kate
    - name: web-based-kate-tmp
      mountPath: /tmp

Configuration of our Guacamole setup

After having the deployment in place, we need to prepare the configuration for our Guacamole setup. In order to know, what users exist and which connections should be offered, we need to provide a mapping configuration to Guacamole.

In this example, a simple user mapping is shown for demonstration purposes. It uses a static mapping defined in a XML file that is handed over to the Guacamole server. Typically, you would use other authentication methods instead like a database or LDAP.

This said however, let’s continue with our static one. For this, we simply define a Kubernetes Secret which is mounted into the Guacamole server. Hereby, it defines two configuration files. One is the so called guacamole.properties. This is Guacamole’s main configuration file. Next to this, we also define the user-mapping.xml which contains all available users and their connections.

apiVersion: v1
kind: Secret
metadata:
  name: guacamole-config
stringData:
  guacamole.properties: |
    enable-environment-properties: true
  user-mapping.xml: |
    <user-mapping>
      <authorize username="admin" password="PASSWORD" encoding="sha256">
        <connection name="web-based-kate">
          <protocol>vnc</protocol>
          <param name="hostname">localhost</param>
          <param name="port">5900</param>
        </connection>
      </authorize>
    </user-mapping>

As you can see, we only defined on specific user called admin which can use a connection called web-based-kate. In order to access the kate instance, Guacamole would use VNC as the configured protocol. To make this happen, our web application must offer a VNC Server port on the other side, so that the guacd daemon can then access it to forward the remote session to clients. Keep in mind that you need to replace the string PASSWORD to a proper sha256 sum which contains the password. The sha256 sum could look like this for example:

$ echo -n "test" | sha256sum
9f86d081884c7d659a2feaa0c55ad015a3bf4f1b2b0b822cd15d6c15b0f00a08  -

Next, the hostname parameter is referencing the corresponding VNC server of our kate container. Since we are starting our container alongside with our Guacamole containers within the same pod, the Guacamole Server as well as the guacd daemon can access this application via localhost. There is no need to set up a Kubernetes Service in front of it since only guacd will access the VNC server and forward the remote session via HTTP to clients accessing Guacamole via their web browsers. Finally, we also need to specify the VNC server port which is typically 5900 but this could be adjusted if needed.

The corresponding guacamole.properties is quite short. By enabling the enabling-environment-properties configuration parameter, we make sure that every Guacamole configuration parameter can also be set via environment variables. This way, we don’t need to modify this configuration file each and every time when we want to adjust the configuration but we only need to provide updated environment variables to the Guacamole server container.

Make Guacamole accessible

Last but not least, we must make the Guacamole server accessible for clients. Although each provided service can access each other via localhost, the same does not apply to clients trying to access Guacamole. Therefore, we must make Guacamole’s server port 8080 available to the outside world. This can be achieved by creating a Kubernetes Service of type NodePort. This service is forwarding each request from a local node port to the corresponding container that is offering the configured target port. In our case, this would be the Guacamole server container which is offering port 8080.

apiVersion: v1
kind: Service
metadata:
  name: web-based-kate
spec:
  type: NodePort
  selector:
    app: web-based-kate
  ports:
    - name: http
      protocol: TCP
      port: 8080
      targetPort: 8080
      nodePort: 30000

This specific port is then mapped to the Node’s 30000 port for which we also configured the kind cluster in such a way that it forwards its node port 30000 to the host system’s port 30000. This port is the one that we would need to use to access Guacamole with our web browsers.

Prepartion of the Application container

Before we can start to deploy our application, we need to prepare our kate container. For this, we simply create a Debian container that is running kate. Keep in mind that you would typically use lightweight base images like alpine to run applications like this. For this demonstration however, we use the Debian images since it is easier to spin it up but in general you only need a small friction of the functionality that is provided by this base image. Moreover – from an security point of view – you want to keep your images small to minimize the attack surface and make sure it is easier to maintain. For now however, we will continue with the Debian image.

In the example below, you can see a Dockerfile for the kate container.

FROM debian:12

# Install all required packages
RUN apt update && \
    apt install -y x11vnc xvfb kate

# Add user for kate
RUN adduser kate --system --home /home/kate -uid 999

# Copy our entrypoint in the container
COPY entrypoint.sh /opt

USER 999
ENTRYPOINT [ "/opt/entrypoint.sh" ]

Here you see that we create a dedicated user called kate (User ID 999) for which we also create a home directory. This home directory is used for all files that kate is creating during runtime. Since we set the readOnlyRootFilesystem to true, we must make sure that we mount some sort of writable volume (e.g EmptyDir) to kate’s home directory. Otherwise, kate wouldn’t be able to write any runtime data then.

Moreover, we have to install the following three packages:

x11vnc
xvfb
kate

These are the only packages we need for our container. In addition, we also need to create an entrypoint script to start the application and prepare the container accordingly. This entrypoint script creates the configuration for kate, starts it in a virtual display by using xvfb-run and provides this virtual display to end users by using the VNC server via x11vnc. In the meantime, xdrrinfo is used to check if the virtual display came up successfully after starting kate. If it takes to long, the entrypoint script will fail by returning the exit code 1.

By doing this, we ensure that the container is not stuck in an infinite loop during a failure and let Kubernetes restart the container whenever it couldn’t start the application successfully. Furthermore, it is important to check if the virtual display came up prior of handing it over to the VNC server because the VNC server would crash if the virtual display is not up and running since it needs something to share. On the other hand though, our container will be killed whenever kate is terminated because it would also terminate the virtual display and in the end it would then also terminate the VNC server which let’s the container exit, too. This way, we don’t need take care of it by our own.

#!/bin/bash

set -e

# If no resolution is provided
if [ -z $VNC_RESOLUTION_WIDTH ]; then
  VNC_RESOLUTION_WIDTH=1920
fi

if [ -z $VNC_RESOLUTION_HEIGHT ]; then
  VNC_RESOLUTION_HEIGHT=1080
fi

# If no server port is provided
if [ -z $VNC_SERVER_PORT ]; then
  VNC_SERVER_PORT=5900
fi

# Prepare configuration for kate
mkdir -p $HOME/.local/share/kate
echo "[MainWindow0]
"$VNC_RESOLUTION_WIDTH"x"$VNC_RESOLUTION_HEIGHT" screen: Height=$VNC_RESOLUTION_HEIGHT
"$VNC_RESOLUTION_WIDTH"x"$VNC_RESOLUTION_HEIGHT" screen: Width=$VNC_RESOLUTION_WIDTH
"$VNC_RESOLUTION_WIDTH"x"$VNC_RESOLUTION_HEIGHT" screen: XPosition=0
"$VNC_RESOLUTION_WIDTH"x"$VNC_RESOLUTION_HEIGHT" screen: YPosition=0
Active ViewSpace=0
Kate-MDI-Sidebar-Visible=false" > $HOME/.local/share/kate/anonymous.katesession

# We need to define an XAuthority file
export XAUTHORITY=$HOME/.Xauthority

# Define execution command
APPLICATION_CMD="kate"

# Let's start our application in a virtual display
xvfb-run \
  -n 99 \
  -s ':99 -screen 0 '$VNC_RESOLUTION_WIDTH'x'$VNC_RESOLUTION_HEIGHT'x16' \
  -f $XAUTHORITY \
  $APPLICATION_CMD &

# Let's wait until the virtual display is initalize before
# we proceed. But don't wait infinitely.
TIMEOUT=10
while ! (xdriinfo -display :99 nscreens); do 
  sleep 1
  let TIMEOUT-=1
done

# Now, let's make the virtual display accessible by
# exposing it via the VNC Server that is listening on
# localhost and the specified port (e.g. 5900)
x11vnc \
  -display :99 \
  -nopw \
  -localhost \
  -rfbport $VNC_SERVER_PORT \
  -forever

After preparing those files, we can now create our image and import it to our Kubernetes cluster by using the following commands:

# Do not forget to give your entrypoint script
# the proper permissions do be executed
$ chmod +x entrypoint.sh

# Next, build the image and import it into kind,
# so that it can be used from within the clusters.
$ sudo docker build -t registry.example.com/own-app/web-based-kate:0.0.1 .
$ sudo kind load -n guacamole docker-image registry.example.com/own-app/web-based-kate:0.0.1

The image will be imported to kind, so that every workload resource operated in our kind cluster can access it. If you use some other Kubernetes cluster, you would need to upload this to a registry that your cluster can pull images from.

Finally, we can also apply our previously created Kubernetes manifests to the cluster. Let’s say we saved everything to one file called kuberentes.yaml. Then, you can simply apply it like this:

$ kubectl apply -f kubernetes.yaml
deployment.apps/web-based-kate configured
secret/guacamole-config configured
service/web-based-kate unchanged

This way, a Kubernetes Deployment, Secret and Service is created which ultimately creates a Kubernetes Pod which we can access afterwards.

$ kubectl get pod
NAME                              READY   STATUS    RESTARTS   AGE
web-based-kate-7894778fb6-qwp4z   3/3     Running   0          10m

Verification of our Deployment

Now, it’s money time! After preparing everything, we should be able to access our web based kate application by using our web browser. As mentioned earlier, we configured kind in such a way that we can access our application by using our local port 30000. Every request to this port is forwarded to the kind control plane node from where it is picked up by the Kubernetes Service of type NodePort. This one is then forwarding all requests to our designated Guacamole server container which is offering the web server for accessing remote application’s via Guacamole.

If everything works out, you should be able to see the the following login screen:

After successfully login in, the remote connection is established and you should be able to see the welcome screen from kate:

If you click on New, you can create a new text file:

Those text files can even be saved but keep in mind that they will only exist as long as our Kubernetes Pod exists. Once it gets deleted, the corresponding EmptyDir, that we mounted into our kate container, gets deleted as well and all files in it are lost. Moreover, the container is set to read-only meaning that a user can only write files to the volumes (e.g. EmptyDir) that we mounted to our container.

Conclusion

After seeing that it’s relatively easy to convert every application to a web based one by using Apache Guacamole, there is only one major question left…

What do you prefer the most. Salsa or Guacamole?

Integrating Proxmox Backup Server into Proxmox Clusters

Proxmox Backup Server

In today’s digital landscape, where data reigns supreme, ensuring its security and integrity is paramount for businesses of all sizes. Enter Proxmox Backup Server, a robust solution poised to revolutionize data protection strategies with its unparalleled features and open-source nature.

At its core, Proxmox Backup Server is a comprehensive backup solution designed to safeguard critical data and applications effortlessly in virtualized environments based on Proxmox VE. Unlike traditional backup methods, Proxmox Backup Server offers a streamlined approach, simplifying the complexities associated with data backup and recovery.

One of the standout features of Proxmox Backup Server is its seamless integration with Proxmox Virtual Environment (PVE), creating a cohesive ecosystem for managing virtualized environments. This integration allows for efficient backup and restoration of Linux containers and virtual machines, ensuring minimal downtime and maximum productivity. Without the need of any backup clients on each container or virtual machine, this solution still offers the back up and restore the entire system but also single files directly from the filesystem.

Proxmox Backup Server provides a user friendly interface, making it accessible to both seasoned IT professionals and newcomers alike. With its intuitive design, users can easily configure backup tasks, monitor progress, and retrieve data with just a few clicks, eliminating the need for extensive training or technical expertise.

Data security is a top priority for businesses across industries and Proxmox Backup Server delivers on this front. Bundled with solutions like ZFS it also brings in all the enterprise filesystem features like encryption at rest, encryption at transition, checksums, snapshots, deduplication and compression but also integrating iSCSI or NFS storage from enterprise storage solutions like from NetApp can be used.

Another notable aspect of Proxmox Backup Server is its cost effectiveness. As an open-source solution, it eliminates the financial barriers (also in addition with the Proxmox VE solutions) associated with proprietary backup software.

Integrating Proxmox Backup Server into Proxmox Clusters

General

This guide expects you to have already at least one Proxmox VE system up and running and also a system where a basic installation of Proxmox Backup Server has been performed. Within this example, the Proxmox Backup Server is installed on a single disk, where the datastore gets attached to an additional block device holding the backups. Proxmox VE and Proxmox Backup Server instances must not be in the same network but must be reachable for each other. The integration requires administrative access to the datacenter of the Proxmox VE instance(s) and the Backup Server.

Prerequisites

Proxmox VE (including the datacenter).
Proxmox Backup Server (basic installation).
Administrative access to all systems.
Network reachability.
Storage device holding the backups (in this case a dedicated block storage device).

Administration: Proxmox Backup Server

Like the Proxmox VE environment, the Proxmox Backup Server comes along with a very intuitive web frontend. Unlike the web frontend of Proxmox VE, which runs on tcp/8006, the Proxmox Backup Server can be reached on tcp/8007. Therefore, all next tasks will be done on https://<IP-PROXMOX-BACKUP-SERVER>:8007.

After logging in to the web frontend, the dashboard overview welcomes the user.

Adding Datastore / Managing Storage

The initial and major tasks relies in managing the storage and adding a usable datastore for the virtualization environment holding the backup data. Therefore, we switch to the Administration chapter and click on Storage / Disks. This provides an overview of the available Devices on the Proxmox Backup Server. As already being said, this example uses a dedicated block storage device which will be used with ZFS to benefit from checksums, deduplication, compression which of course can also be used in addition with multiple disks (so called raidz-levels) or with other solutions like folder or NFS shares. Coming back to our example, we can see the empty /dev/sdb device which will be used to store all backup files.

By clicking on ZFS in the top menu bar, a ZFS trunk can be created as a datastore. Within this survey, a name, the raid level, compression and the devices to use must be defined. As already mentioned, we can attach multiple disks and define a desired raid level. The given example only consists of a single disk, which will be defined here. Compression is optional, but using LZ4 as a compression is recommended. As a lossless data compression algorithm, LZ4 aims to provide a good trade off between speed and compression ratio which is very transparent on today’s system.

Ensure to check Add as Datastore option (default) will create the given name directly as a usable datastore. In our example this will be backup01.

Keep in mind, that this part is not needed when using a NFS share. Also do not use this in addition with hardware RAID controllers.

Adding User for Backup

In a next step, a dedicated user will be created that will be used for the datastore permissions and for the Proxmox VE instances for authentication and authorization. This allows even complex setups with different datastores, different users including different access levels (e.g., reading, writing, auditing,…) on different clusters and instances. To keep it simple for demonstrations, just a single user for everything will be used.

A new user is configured by selecting Configuration, Access Control and User Management in the left menu. There, a new user can be created by simply defining a name and a password. The default realm should stay on the default for the Proxmox Backup authentication provider. Depending on the complexity of the used name schema, you may also create reasonable users. In the given example, the user is called dc01cluster22backup01.

Adding Permission of User for Datastore

Mentioning already the possibility to create complex setups regarding authentication and authorization, the datastore must be linked to at least a single user that can access it. Therefore, we go back to the Datastore and select the previously created backup01 datastore. In the top menu bar, the permissions can be created and adjusted in the Permissions chapter. Initially, a new one will be created now. Within the following survey the datastore or path, the user and the role must be defined:

Path: /datastore/backup01
User: dc01cluster22backup01@pbs
Role: DatastoreAdmin
Propagate: True

To provide a short overview of the possible roles, this will be shortly mentioned without any further explanation:

Admin
Audit
DatastoreAdmin
DatastoreAudit
DatastoreBackup
DatastorePowerUser
DatastoreReader

Administration: Proxmox VE

The integration of the backup datastore will be performed from the Proxmox VE instances via the Datacenter. As a result, the Proxmox VE web frontend will now be used for further administrative actions. The Proxmox VE web frontend runs on tcp/8006, Therefore, all next tasks will be done on https://<IP-PROXMOX-VE-SERVER>:8006.

Adding Storage

Integrating the Proxmox Backup Server works the same way like managing and adding a shared storage to a Proxmox datacenter.

In the left menu we choose the active datacenter and select the Storage options. There, we can find all natively support storage options like (NFS, SMB/CIFS, iSCSI, ZFS, GlusterFS,…) of Proxmox and finally select the Proxmox Backup Server as a dedicated item.

Afterwards, the details for adding this datastore to the datacenter must be inserted. The following options need to be defined:

ID: backup22
Server: <FQDN-OR-IP-OF-BACKUP-SERVER>
Username: dc01cluster22backup01@pbs
Password: <THE-PASSWORD-OF-THE-USER>
Enable: True
Datastore: backup01
Fingerprint: <SYSTEM-FINGERPRINT-OF-BACKUP-SERVER>

Optionally, also the Backup Retention and Encryption can be configured before adding the new backup datastore. While the backup retention can also be configured on the Proxmox Backup Server (which is recommended), enabling the encryption should be considered. Selecting an d activating the encryption is easily done by simply setting it to Auto-generate a client encryption key. Depending on your previous setup, also an already present key can be uploaded and used.

After adding this backup datastore to the datacenter, this can immediately be used for backup and the integration is finalized.

Conclusion

Proxmox provides with the Proxmox Backup Server an enterprise backup solution, for backing up Linux containers and virtual machines. Supporting features like incremental and fully deduplicated backups by using the benefits of different open-source solutions, in addition with strong encryption and data integrity this solution is a prove that open-source software can compete with closed-source enterprise software. Together with Proxmox VE, enterprise like virtualization environments can be created and managed without missing the typical enterprise feature set. Proxmox VE and the Proxmox Backup Server can also be used in addition to storage appliances from vendors like NetApp, by directly use iSCSI or NFS.

Providing this simple example, there are of course much more complex scenarios which can be created and also should be considered. We are happy to provide you more information and to assist you creating such setups. We also provide help for migrating from other products to Proxmox VE setups. Feel free to contact us at any time for more information.

Migrating VMs from VMware ESXi to Proxmox

In response to Broadcom’s recent alterations in VMware’s subscription model, an increasing number of enterprises are reevaluating their virtualization strategies. With heightened concerns over licensing costs and accessibility to features, businesses are turning towards open source solutions for greater flexibility and cost-effectiveness. Proxmox, in particular, has garnered significant attention as a viable alternative. Renowned for its robust feature set and open architecture, Proxmox offers a compelling platform for organizations seeking to mitigate the impact of proprietary licensing models while retaining comprehensive virtualization capabilities. This trend underscores a broader industry shift towards embracing open-source technologies as viable alternatives in the virtualization landscape. Just to mention, Proxmox is widely known as a viable alternative to VMware ESXi but there are also other options available, such as bhyve which we also covered in one of our blog posts.

Benefits of Opensource Solutions

In the dynamic landscape of modern business, the choice to adopt open source solutions for virtualization presents a strategic advantage for enterprises. With platforms like KVM, Xen and even LXC containers, organizations can capitalize on the absence of license fees, unlocking significant cost savings and redirecting resources towards innovation and growth. This financial flexibility empowers companies to make strategic investments in their IT infrastructure without the burden of proprietary licensing costs. Moreover, open source virtualization promotes collaboration and transparency, allowing businesses to tailor their environments to suit their unique needs and seamlessly integrate with existing systems. Through community-driven development and robust support networks, enterprises gain access to a wealth of expertise and resources, ensuring the reliability, security, and scalability of their virtualized infrastructure. Embracing open source virtualization not only delivers tangible financial benefits but also equips organizations with the agility and adaptability needed to thrive in an ever-evolving digital landscape.

Migrating a VM

Prerequisites

To ensure a smooth migration process from VMware ESXi to Proxmox, several key steps must be taken. First, SSH access must be enabled on both the VMware ESXi host and the Proxmox host, allowing for remote management and administration. Additionally, it’s crucial to have access to both systems, facilitating the migration process. Furthermore, establishing SSH connectivity between VMware ESXi and Proxmox is essential for seamless communication between the two platforms. This ensures efficient data transfer and management during migration. Moreover, it’s imperative to configure the Proxmox system or cluster in a manner similar to the ESXi setup, especially concerning networking configurations. This includes ensuring compatibility with VLANs or VXLANs for more complex setups. Additionally, both systems should either run on local storage or have access to shared storage, such as NFS, to facilitate the transfer of virtual machine data. Lastly, before initiating the migration, it’s essential to verify that the Proxmox system has sufficient available space to accommodate the imported virtual machine, ensuring a successful transition without storage constraints.

Activate SSH on ESXi

The SSH server must be activated in order to copy the content from the ESXi system to the new location on the Proxmox server. The virtual machine will later be copied from the Proxmox server. Therefore, it is necessary that the Proxmox system can establish an SSH connection on tcp/22 to the ESXi system:

Log in to the VMware ESXi host.
Navigate to Configuration > Security Profile.
Enable SSH under Services.

Find Source Information about VM on ESXi

One of the challenging matters in finding the location of the virtual machine holding the virtual machine disk. The path can be found within the web UI of the ESXi system:

Locate the ESXi node that runs the Virtual Machine that should be migrated
Identify the virtual machine to be migrated (e.g., pgsql07.gyptazy.ch).
Obtain the location of the virtual disk (VMDK) associated with the VM from the configuration panel.
The VM location path should be shown (e.g., /vmfs/volumes/137b4261-68e88bae-0000-000000000000/pgsql07.gyptazy.ch).
Stop and shutdown the VM.

Create a New Empty VM on Proxmox

Create a new empty VM in Proxmox.
Assign the same resources like in the ESXi setup.
Set the network type to VMware vmxnet3.
- Ensure the needed network resources (e.g., VLAN, VXLAN) are properly configured.
Set the SCSCI controller for the disk to VMware PVSCSI.
- Do not create a new disk (this will be imported later from the ESXi source).
Each VM gets an ID assigned by Proxmox (note it down, it will be needed later).

Copy VM from ESXi to Proxmox

The content of the virtual machine (VM) will be transferred from the ESXi to the Proxmox system using the open source tool rsync for efficient synchronization and copying. Therefore, the following commands need to be executed from the Proxmox system, where we create a temporary directory to store the VM’s content:

mkdir /tmp/migration_pgsql07.gyptazy.ch
cd /tmp/migration_pgsql07.gyptazy.ch
rsync -avP root@esx02-test.gyptazy.ch:/vmfs/volumes/137b4261-68e88bae-0000-000000000000/pgsq07.gyptazy.ch/* .

Depending on the file size of them virtual machine and the network connectivity this process may take some time.

Import VM in Proxmox

Afterwards, the disk is imported using the qm utility, defining the VM ID (which got created during the VM creation process), along with specifying the disk name (which has been copied over) and the destination data storage on the Proxmox system where the VM disk should be stored:

qm disk import 119 pgsql07.gyptazy.ch.vmdk local-lvm

Depending on the creation format of the VM or the exporting format there may be multiple disk files which may also be suffixed by _flat. This procedure needs to be repeated by all available disks.

Starting the VM

In the final step, all settings, resources, definitions and customizations of the system should be thoroughly reviewed. One validated, the VM can be launched, ensuring that all components are correctly configured for operation within the Proxmox environment.

Conclusion

This article only covers one of many possible methods for migrations in simple, standalone setups. In more complex environments involving multiple host nodes and different storage systems like fibre channel or network storage, there are significant differences and additional considerations. Additionally, there may be specific requirements regarding availability and Service Level Agreements (SLAs) to be concern. This may be very specific for each environment. Feel free to contact us for personalized guidance on your specific migration needs at any time. We are also pleased to offer our support in related areas in open source such as virtualization (e.g., OpenStack, VirtualBox) and topics pertaining to cloud migrations.

Addendum

On the 27th of March, Proxmox released their new import wizard (pve-esxi-import-tools) which makes migrations from VMware ESXi instances to a Proxmox environment much easier. Within an upcoming blog post we will provide more information about the new tooling and cases where this might be more useful but also covering the corner cases where the new import wizard cannot be used.

Efficient Storage Automation in Proxmox with the proxmox_storage Module

Ansible Module: proxmox_storage

Add iSCSI Storage

Add Proxmox Backup Server

Remove Storage

Conclusion

Automated Proxmox Subscription Handling with Ansible

Ansible Module: proxmox_node

Subscription Integration Example

Conclusion

Efficient Proxmox Cluster Deployment through Automation with Ansible

Benefits

Ansible Proxmox Module: proxmox_cluster

Creating a Cluster

Joining a Cluster

Cluster Join Informationen

Conclusion

Introduction

Preparations

Low-level prototype

Basic functionality

Validating input

Content Checksums

Improving memory footprint

Ensure it the Puppet way

Final low-level prototype

Resource-API prototype

Basic functionality

Canonicalize

Thoughts

Introduction

Overview of possible migration approaches

Setup of Valkey

Start the server and establish a client connection:

Automated data migration to Valkey

Conclusion

Mastering Cloud Infrastructure with Pulumi: Introduction

Pulumi – The Theory

Why Pulumi?

Challenges with Pulumi

State Management in Pulumi: Ensuring Consistency Across Deployments

Other IaC Tools: Comparing Pulumi to Traditional IaC Tools

Pulumi – In Practice

Introduction

Prerequisites

Project Structure

Overview of a Typical Pulumi Project Directory

NetApp FSx on AWS

Introduction

Key Features

What It’s About

Putting It All Together

Architecture Overview

Setting up Pulumi

Example: VPC, Subnets, and FSx for NetApp ONTAP

Pulumi Code Example: VPC, Subnets, and FSx for NetApp ONTAP

Pytest with Pulumi

Introduction

Testing Code

Github Actions

Introduction

Why Use GitHub Actions and Its Importance

Execution

Example: Deploy infrastructure with Pulumi

Output:

Outlook: Exploring Advanced Features of Pulumi and Amazon FSx for NetApp ONTAP

Advanced Features of Pulumi

Advanced Features of Amazon FSx for NetApp ONTAP

What’s Next?

Conclusion

Disclaimer

Suggested Links

Veeam & Proxmox VE

Opportunities for Open-Source Virtualization

Integration into Veeam

Overview

Requirements

Configuration

Adding a Proxmox Server

Usage