Quantcast
Viewing all 201 articles
Browse latest View live

Cluster Shared Volumes: Guest SQL Database very poor disk IO Performance

Hi,

We are experiencing a large performance issue with an SQL Express server running as a Guest OS in Hyper-V using CSV as virtual disk storage.

During a database import operation the host OS shows a disk transfer of 2Mb/s and 100% disk time.  At present it takes approx 20x longer to perform this import on the guest os that it did with the old physical os with a single spindle disk...

The disk array in question is no where near full load, a second volume that is not clustered and is sharing the same spindles is able to deliver upwards of 100Mb/s of traffic whilst this database import is in progress. 

We are also experiencing simaler problems under backups of the VHD files using DPM,  whenever a backup is run, the CSV LUN reports 100% disk time but the second non CSV volume is still able to operate at very satisfactory speeds, anything on the CSV grinds to a halt, ie all the virtual machines.

 

Really stumped as to what to look into next, its almost as if there is some bottleneck in the CSV system that is not allowing it to fully utilise the underlying disk array performance.

In summary, why is my CSV volume no where near using the full disk array performance?

Any help much appreciated.


Cluster Set documentation

Cluster engineering has just posted a new blog regarding Cluster Sets that includes a video and links to Cluster Sets documentation.

//blogs.msdn.microsoft.com/clustering/2018/07/10/introduction-to-cluster-sets-in-windows-server-2019/



tim

DAG is not working

hi all . 

currently we have DAG with two windows server 2012 server running 2013 DAG , the issue is that the DAG not working and when checking the cluster services it is showing that ( Unable to obtain a login token ) event ID 1228 

this is happen after some power fail happen to one of the node . 

any idea how to start the troubleshoot and bring this DAG online 

thanks 

SQL 2012 Failover Cluster - unable to start because of 'Network Name' failed.

Hi all,

Running a 2012R2 Failover Cluster with SQL 2012. I'm unable to start the SQL 2012 Cluster Role because of the following error;

Log Name:      System
Source:        Microsoft-Windows-FailoverClustering
Event ID:      1069
Description:
Cluster resource 'SQL Network Name (SCSQLCL01)' of type 'Network Name' in clustered role 'SQL Server (VMM)' failed.

Failover cluster manager shows the following;

Image may be NSFW.
Clik here to view.

Image may be NSFW.
Clik here to view.

 

Observations thus far;

  • Passes all cluster validation tests (no issues)
  • Am sometimes seeing Kerberos errors in the log for both cluster members, but it's not consistent and I cannot pin down the cause;

The Kerberos client received a KRB_AP_ERR_MODIFIED error from the server scsqlcl01-2$. The target name used was MSServerClusterMgmtAPI/SCSQLCL01CORE.service.local.

  • The cluster computer object has been granted permissions on the cluster
  • All computer objects are created, and DNS entries are present.
  • It sometimes "just works". It comes online without a hitch and I can communicate with the cluster name using the SQL instance no problems

Any help would be appreciated.

Thanks.

 

 

 

 

Hyper-V replica from WS2012 Cluster to standalone server

Hello, I would like to have your opinion and if yes or no it is a normal behaviour.

I am experiencing an issue regarding the behaviour of my replication architechture from clustered servers to standalone replica server.

I am running highly available VM on a WS2012 Hyper-V Cluster (2 Hyper-V nodes). The storage (Netapp FAS2240) is attached via iSCSI. CSV is enabled. Hyper-v Broker OK …

Everything works so far regarding the cluster itself and replication. My VM are replicated on a third WS2012 Hyper-V Server outside the cluster (same domain). Replication is OK, failover test OK, planned Failover OK, reverse replication OK !

The problem is when I am simulating a power outage on one of the cluster nodes, (everything operates as expected), the VM are rebooted on the surviving node but the status of the replication goes to “Critical” full resynchronization is needed then (error Hyper-v-vmms Id 32544 : virtual disk modified by compaction, expansion or offline) ?!

In brief, the replication is OK when the two nodes are online : I can live migrate the VM (replication stays OK), move the broker from one node to the other (replication stays OK), move the CSV from one owner to the other (replication stays OK). But as soon as one of the two nodes is offline, the replication is broken and need a full resync : ID 32544, 32326, 32026 !

Is this behaviour normal when replicating from a cluster or should the replication continues normally despite the node failover ? I guess something is wrong there.

BRGDS
Ludo

Hot Topic | Installing Exchange 2013 CU22

Hey Guys, 

Initiating this thread to discuss to prepare ourselves to install the recent CU22 update for Exchange 2013

From the technet blogs its very clear that we have the below issues focused n fixed,

  • 4487603 "The action cannot be completed" error when you select many recipients in the Address Book of Outlook in Exchange Server 2013
  • 4490060 Exchange Web Services Push Notifications can be used to gain unauthorized access
  • 4490059 Reducing permissions required to run Exchange Server using Shared Permissions Mod

When it comes to real time deployment we will have few things to clarify for a smooth installation.

1. Will this reduced permission model impact any of the exchange operations ? 

2. Will the Changes to the EWS architecture impact SFB & Outlook for MAC or any other applications?

"After this change, clients that rely on an authenticated EWS Push Notification from the server that is running Exchange Server will require a client update to continue to function correctly."

One major concern is that we had to reset the exchange computer account password.

Please do pour in your inputs to perform a flawless upgrade.


Ganesh G | Messaging Consultant

Listing of all Failover Clustering Events for Reference

I am not a moderator, but this might be worth setting up as a sticky note for this forum.

Ever need to know what a particular cluster event meant?  Have trouble finding a good explanation of the code?  The Microsoft Program Manager (John Marlin) responsible for maintaining this information just put up a blog post that contains a complete list of all the 2016 and 2019 event codes.  Earlier versions often use the same codes.  Newer versions generally add codes.

Find the blog here - https://techcommunity.microsoft.com/t5/Failover-Clustering/You-been-asking-I-am-delivering/ba-p/447150

The spreadsheet is found here - https://techcommunity.microsoft.com/t5/Failover-Clustering/You-been-asking-I-am-delivering/ba-p/447150?attachment-id=12219


tim

Microsoft windows server 2012 Cluster issue.

Hello!

We have a Hyper V cluster setup which includes Multiple blade servers  all are running Windows 2012.  When i try to live migrate between nodes,only one node occurred below error.

Note:

That node hardware configuration are same as other nodes.

--------------------------------------------------------------------------------------------------------------------------------------------

Live Migration of Virtual Machine operation for 'External Secure FTP'

Virtual Machine operation for 'External Secure FTP' failed at migration destination 'B-HVM-P786-P10'.(virtual machine ID FF560AA2-0908-4979-8AD4-CB2B0AFEBAE5)

The virtual machine 'external secure FTP' is using processor-specific features not supported on physical computer 'BHVM-P786-P10' To allow for migration of this virtual machine to physical computer with different processors, modify the virtual machine setting to limit the processor feature used by the virtual machine. (virtual machine ID FF560AA2-0908-4979-8AD4-CB2B0AFEBAE5)


Failover Clustering Task Scheduler Survey

Validate Active Directory Configuration

    Hi Folks, 

    Getting the below error while testing the cluster failover validation .

    Description: Validate that all the nodes have the same domain, domain role, and organizational unit.

    Start: 8/14/2019 2:54:31 PM.
    Validating that all nodes have the same domain, domain role, and organizational unit.
    FqdnDomainDomain RoleSite NameOrganizational Unit
    USTYHPV01..COM.COMMember ServerDefault-First-Site-Name
    The distinguished name of node USTYHPV01 could not be determined because of this error: There was an error getting information about the organization unit for node 'USTYHPV01..COM' from the domain '.com'.
    The organizational unit of node USTYHPV01.COM could not be determined because of this error: Did not find an Organization Unit (OU) in the Active Directory
    Connectivity to a writable domain controller from node USTYHPV01.COM could not be determined because of this error: Could not get domain controller name from machine USTYHPV01.
    Node(s) USTYHPV01.COM cannot reach a writable domain controller. Please check connectivity of these nodes to the domain controllers.

Microsoft Server 2008 Failover Cluster Error Event ID 1230

We are getting the following error, Event ID 1230. Cluster resource 'FileServer-(xxxx)(Cluster Disk 3)' (resource type", DLL "clusres.dll') either crashed or deadlocked. The Resource Hosting Subsystem (RHS) process will now attempt to terminate, and the resource will be marked to run in a separate monitor.

No RESRCMON.DMP was created, so I can't troubleshoot to find the dll causing the problem.

Suggestions anyone.

It seems to me once RHS blows up the cluster resouces also blow up; theses include a FSRM report, TSM (Tivoli Storage Manager) Client Acceptor, and TSM Scheduler Services.

Rebooting Server 2016 SQL Failover Cluster Node results in Blue Screen 0x0D1 after trying to recover cluster state upon booting up

Hello,

I have an odd one. While a node is live, without draining or removing from the cluster we do the following:

1. Reboot it

2. Upon coming back up, sign in

3. Within a minute itll bluescreen

4. Boot back up, sign in, everything is fine

The dump shows ntoskrnl.exe DRIVER_IRQL_LESS_THAN_OR_EQUAL 0x000000D1

If you check the cluster operational, youll see it start some GUM Process with GrantLock, Process Request lock. This happens over and over until it bluescreens. Subsequent reboot from bluescreen shows GUM but it only does the "processing locally". Events below:

Preceeding Bluescreen(these repeated over and over and were even suppressed per application log):

[GUM] Node 2: Processing RequestLock 4:595
[GUM] Node 2: Processing GrantLock to 4 (sent by 5 gumid: 20121)

Post Bluescreen (note these still showed pre-bluescreen above but rarely):

[GUM]Node 2: Executing locally gumId: 20121, updates: 1, first action: /dm/update

Before the bluescreen in the event viewer the following happens with the NIC. Keep in mind this NIC is apart of a team. 2 of the 4 team members are down (waiting to be plugged in if the others die) and 2 are live. This team is handled by the OS in Server Manager. We are using Intel drivers not system drivers. Latest.

Reboot - 9:51
Kernel Power Hardware Notifications upon boot up
Connectivity state in standby: Disconnected, Reason: NIC compliance - 9:54

both adapters come online - 9:54
Intel® Ethernet 10G 4P X520/I350 rNDC
Network link has been established at 10Gbps full duplex.

and

Intel® Ethernet 10G 2P X520 Adapter #2
Network link has been established at 10Gbps full duplex.

===============================

NIC report disconnected

Intel® Ethernet 10G 2P X520 Adapter
Network link is disconnected.

Intel® Ethernet 10G 4P X520/I350 rNDC #2
Network link is disconnected.


MsLbfoSys

Member Nic {30793b81-07bd-4afe-85f6-6dd873581384} Connected.

NIC Disconnects again

Intel® Ethernet 10G 4P X520/I350 rNDC
Network link is disconnected.


NICs reconnect

Intel® Ethernet 10G 4P X520/I350 rNDC
Network link has been established at 10Gbps full duplex.

MsLbfoSys

Member Nic {7947a925-563e-4bf8-b3c6-73c46ef2d4ed} Connected.


DNS Resolution and Domain Resolution fail - 9:55

lphplsvc reports that network is coming up - 9:55

At this point you can sign into the server and shortly there after itll bluescreen. I have not yet tested it but I believe it will also bluescreen without signing in(as was reported to me), im just relaying the recent event. This doesnt happen everytime but is a 50/50. Ill test in my lab this coming week to reproduce. Anything additional I should capture? As a note, this is reproducable across 10 similar physical servers, with a 2 cluster split of 5 each.

I see a hotfix for this issue 0x0D1 for server 2012 but this 2016. I have a feeling that the Network coming up causes Windows or the Cluster to grab the address space for the driver and then the opposite one tries for it upon network recovery above but it fails to release the address space. I am assuming the cluster is snagging it then windows is trying after, thus ntokrnl.exe being at fault.

Any input would be great, this is an odd one and im hoping to track it down. I understand that delaying the startup of SQL services might be a suggestion but I mixed reviews on doing that and being that it seems like cluster activity not so much SQL, im wondering if that is even an option here.


ADMIN $ share on the failover cluster does not exist

Colleagues, help!
I have a working CLUSTER cluster of two nodes. On both nodes is windows server 2012R2. I am trying to install the Data Protection Manager 2012R2 agent to protect my cluster. In doing so, I get the error "The agent operation failed because the ADMIN $ share on CLUSTER does not exist".
I checked this situation and this is what happened:
\\ node01 \ admin $ - available
\\ node02 \ admin $ - available
\\ CLUSTER \ admin $ - not available
How can I open access to the administrative shared folder on the cluster?
Thanks!

Windows cluster - Hardware migration with Windows 2016 upgrade

Dear Cluster Gurus.

This is my scenario:

Existing setup : HP Gen 7 physical hosts with Windows 2008 R2 --> We have two node windows cluster (node1 & node2) running Oracle database / 3rd part applications.  has 4 virual IP's and 10 shared disks

New Setup : we want to migrate the above clusters to new Hardware HP gen 10 with windows 2016 LTSB. without changing the hostnames and cluster names..

Proposed method 1: Take image of node1 and node2 (wind 2008 R2) --> clone that image to new Hadware node1 & node2 --> shutdown the old node1 & node2 --> startup the new node1 & node2 --> start windows 2016 upgrade on node1 & node2 

Does the above method work?. anyone tried this?.

what are the alternate solutions?

Thanks

By mistake local administrator group permission is denied for Quorum disk

Hi,

 I have 2 node fail over cluster in Windows 2003 server. Its cluster service account have local admin privilege. I removed this admin privilege from quorum disk by denying Administrators Group from Quorum disk security permission. Now all storage disk are vanished from 2 nodes, cluster service is not running. What is the way out?


File Share Witness Path Changes

Hi 

We are planning to change the FSW path on existing three node cluster, is required downtime?, if no what is the impact if i will change the path that time, also do we need to copy any olde config files to new share?

Appreciate your assistance

New Architecture with Windows Failover Cluster

Hello
  I need to configure a new architecture on Windows Server 2019 with Windows 10 end-points. This configuration will sustain domain controllers (ADDS & DNS), DHCP, KMS, certification services, Log Server, SCCM and Mail Server (Exchange). The mail section will be configured in the end separately with DAG.

  I'm a beginnner with Failover CLuster but I chose this solution in order to implement HA. I have tested the configuration with 3 Nodes on FileServer Role. It works wonderfully.

  I need some expert advice on the manner of configurating the cluster's roles and on the way of implementing the services.
  So ... can someone, please, help me with the answers to the following questions:
 
  - If i have over 10000 end-points, are there 3 cluster's nodes enough ? And a single server iSCSI Target ?
 
  - Wich is the best configuration of DHCP Server ?
    I'm thinking about 3 possible solutions:
    * 2 DHCP Servers with a embeded DHCP Failover installed on 2 VM by using DHCP role from Failover Cluster ;
    * 2 DHCP Servers installed directly on DHCP role from Failover cluster ;
    * 2 DHCP Servers within a DHCP configuration from embedded failover (outside the cluster).

  - iSCSI Server target should be a domain controller or just a memeber server ?
  - Taking into account that the WSUS Server determines a lot of traffic should it be configured outside the cluster ? Same question for SCCM or Log Server (Event Forwarding/Subscription).

  - I have over 50 file servers on Samba (RedHat) on different subnets which shall be transferred on Windows File Server. Taking into account the fact that SCSI disk are local (not UNC Path) it results a limited number of file servers. In this case which do you think that is the best solution to solve the problem ?

Best regards

Dup pings on one node

Hi,

We have a failover cluster on Windows Server 2019 with 2 nodes.

When I ping a VM from a linux machine I get duplicated pings if the VM is on the first node. All VMs on that node do the same behaviour. If I move the VM to the second node there are no dup pings anymore.

As I read this situation is normal on NLB cluster and can be avoid with "FilterIcmp=0". But it is not an NLB cluster therefor I cannot find FilterIcmp property.

HW environments are the same, driver versions are the same.

Do you have any recommendation how can I track down the root cause of the issue?



Windows Server Failover Clustering

Hello good afternoon,

I do not know if it is the right place, but I have a doubt, we want to create a Clustering Failover with Windows Server 2019, my doubt is that if it requires a special version to perform the implementation (Standard, Enterprise, Datacenter).

Anyone know about this ??

Regards.

Why continue to choose disk witness when file share witness is better

Hi,

I have made a lot of tests  last 3 weeks and my conclusion is : File share witness is the best option.

And this is why :

I have 2 nodes clusters running on physical blades in 2 different chassis.

- Node 1 owns witness disk and one role (sql)

- I remove network cable from chassis where node 1 is running.

- Node 1 never release disk reservation on witness disk and cluster service is stopping on node 2.

When i would like Node 2 become the only surviving node ....

Same test but using file share witness.... 

As node 1 is losing connectivity to file share witness, node 2 can take ownership and become the surviving node with my role running.

I am surprised to not see others people with same conclusion. Am i wrong somewhere ?

Thanks for sharing your thoughts 

Viewing all 201 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>